NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Andrews J, Yunker A, Reynolds WS, et al. Noncyclic Chronic Pelvic Pain Therapies for Women: Comparative Effectiveness [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Jan. (Comparative Effectiveness Reviews, No. 41.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Noncyclic Chronic Pelvic Pain Therapies for Women: Comparative Effectiveness

Noncyclic Chronic Pelvic Pain Therapies for Women: Comparative Effectiveness [Internet].

Show details

Methods

Topic Development and Refinement

The topic for this report was nominated in a public process. We drafted the initial Key Questions (KQ) and analytic framework and refined them with input from key informants. After review from the Agency for Healthcare Research and Quality (AHRQ), the questions and framework were posted to a public Web site. The public was invited to comment on these questions.

After reviewing the public commentary, we drafted final KQs and submitted them to AHRQ for review. We identified technical experts on the topic of chronic pelvic pain in women in the fields of gynecology and women's health to provide assistance during the project. The Technical Expert Panel (TEP) contributed to the AHRQ's broader goals of (1) creating and maintaining science partnerships as well as public-private partnerships and (2) meeting the needs of an array of potential customers and users of its products. Thus, the TEP was both an additional resource and a sounding board during the project. The TEP included 5 members serving as technical or clinical experts. To ensure robust, scientifically relevant work, we called on the TEP to provide reactions to work in progress. TEP members participated in conference calls and discussions through e-mail to:

  • Refine the analytic framework and KQs at the beginning of the project;
  • Discuss the preliminary assessment of the literature, including inclusion/exclusion criteria;
  • Provide input on assessing the quality of the literature.

Analytic Framework

We developed the analytic framework (Figure 1) based on clinical expertise and refined it with input from our key informants and TEP members. The framework summarizes the process by which women with noncyclic chronic pelvic pain (CPP) make and modify treatment choices. Treatment choices include surgical or nonsurgical approaches and may lead to outcomes including changes in pain status (e.g., resolution of pain, continuing pain, continued need for pain medication), patient satisfaction, quality of life, or harms/adverse effects.

The analytic framework summarizes the process by which women with noncyclic chronic pelvic pain (CPP) make and modify treatment choices. Treatment choices include surgical or nonsurgical approaches and may lead to outcomes including changes in pain status (e.g., resolution of pain, continuing pain, continued need for pain medication), patient satisfaction, quality of life, or harms/adverse effects. Treatment choices also may not provide pain relief or improvements in functional status or quality of life, and women with CPP may undergo additional interventions after a treatment approach has failed. In addition, outcomes may vary by diagnosis in those patients receiving a confirmed diagnosis for the etiology of their CPP. Numbers in circles within the diagram indicate areas in which the review key questions come into play.

Figure 1

Analytic framework. Abbreviations: BSO = bilateral salpingo-oopherectomy; CAM = complementary and alternative medicine; KQ = key question

Treatment choices may also not provide pain relief or improvements in functional status or quality of life, and women with CPP may undergo additional interventions after a treatment approach has failed. In addition, outcomes may vary by diagnosis in those patients receiving a confirmed diagnosis for the etiology of their CPP.

Literature Search Strategy

Databases

We employed search strategies provided in Appendix A to retrieve research on the treatment of CPP in women. Our primary literature search employed 4 databases: MEDLINE® via the PubMed interface, PsycINFO (psychology and psychiatry literature), EMBASE Drugs and Pharmacology, and the Cumulative Index of Nursing and Allied Health Literature (CINAHL) database. Our search strategies used a combination of subject heading terms appropriate for each database and key words relevant to CPP (e.g., chronic pelvic pain, pelvic pain). We limited searches to the English language and literature published since 1990, when laparoscopic techniques became more widely used.

We also manually searched the reference lists of included studies and of recent narrative and systematic reviews and meta-analyses addressing CPP. We also invited TEP members to provide additional citations.

Grey Literature

The AHRQ Scientific Resource Center also searched for information on the following specific medications used to treat CPP. We requested grey literature information on these drugs and devices as they are either commonly used and have a number of known side effects or are beginning to be used in the CPP population and have not yet been well-reported in the published literature (e.g., aromatase inhibitors):

  • Medroxyprogesterone
  • Gonadotropin releasing hormone (GnRH) agonists (with or without add-back estrogen therapy including buserelin, goserelin, leuprolide, and nafarelin)
  • Selective progesterone receptor modulators (SERMs) (mifepristone and ulipristal acetate);
  • Selective estrogen receptor modulators (tibolone, ranitidine, clomiphene, and tamoxifen);
  • Aromatase inhibitors (anastrozole and letrozole); and
  • Transcutaneous electrical nerve stimulation (TENS).

The Scientific Resource Center sought grey literature in resources including the websites of the US Food and Drug Administration and Health Canada and clinical trials registries such as ClinicalTrials.gov. We also gave manufacturers of these medications and devices an opportunity to provide additional information.

Ongoing Research

To examine the direction of ongoing and recently completed research, we also searched the ClinialTrials.gov and European Union Clinical Trials Register for CPP intervention studies.

Search Terms

Controlled vocabulary terms served as the foundation of our literature search in each database, complemented by additional keyword phrases. We also employed indexing terms when possible within each of the databases to exclude undesired publication types (e.g., reviews, case reports, news), items from non-peer-reviewed journals, and items published in languages other than English.

Our literature searches were executed between September 2010 and May 2011. Appendix A provides our search terms and the yield from each database. We imported all citations into an electronic database created using EndNote. Our search for ongoing research was conducted in July 2011 using the key words “chronic pelvic pain” in each trial registry and limiting to studies in process.

Process for Study Selection

For this review, the relevant population for all KQ was adult women (≥ age 18) with noncyclic or mixed cyclic/noncyclic CPP, which we defined as pain that has persisted for more than 3 months, is localized to the anatomic pelvis (lower abdomen below the umbilicus), and is of sufficient severity that it causes the patient to become functionally disabled or to seek medical care. Pain may sometimes occur in a cyclic pattern; however, a noncyclic component is always present. CPP as described throughout this review refers to noncyclic or mixed cyclic/noncyclic pelvic pain unless otherwise noted.

Inclusion and Exclusion Criteria

We developed criteria for inclusion and exclusion based on the patient populations, interventions, outcome measures, and types of evidence specified in the KQs and in consultation with the TEP. Table 1 summarizes criteria.

Table Icon

Table 1

Inclusion and exclusion criteria.

Study Population

Studies needed to provide adequate information to ensure that participants fell within the target age range and pain criteria. For studies with populations including women under age 18, we retained the study if we could infer that at least 80 percent of the study participants were over the age of 18. Similarly, some studies included women with cyclic chronic pelvic pain and women with noncyclic chronic pelvic pain. We retained studies with participants with both cyclic and noncyclic/mixed chronic pelvic pain if at least 80 percent of the population was composed of women with noncyclic/mixed chronic pelvic pain.

We also applied this criterion to studies including both women and men, retaining studies that included men if the study population was composed of at least 80 percent women with CPP. We attempted to extract data only on the population of interest (adult women with noncyclic/mixed CPP) where possible. We chose the figure of 80 percent as we considered studies in which a majority of participants were within our target age range (18 and older), or had noncyclic CPP, or included a low proportion of men as providing data applicable to the population of adult women with noncyclic CPP.

The inclusion in the study population of fewer than 20 percent of participants with characteristics outside our inclusion criteria of the review may introduce bias in the results, but not to such a degree that the results would not be useful. As appropriate, we note in our discussion of studies that results apply to a heterogeneous age range or pain group or include data from some male participants.

Sample Size

We excluded studies that included fewer than 50 total participants for studies addressing KQs 2 through 5. We considered the following factors in choosing this study size:

  • Prevalence of noncyclic CPP (Prevalence varies by population; to maximize acceptable study size, we set prevalence at 100 percent.)
  • Loss to followup (Loss varies by study; to maximize acceptable sample size, we assumed 0 percent.)
  • Placebo effect (Placebo effects are known to be from 30 to 50 percent in chronic pain studies.63-67)
  • Type I error, alpha level, or p value (We set at a standard of 5 percent.)
  • Desired statistical power level (We set at a standard of 0.80.)
  • Statistic (We used the two-tailed z-test and the t-test for sample size.)
  • Clinical effect size anticipated or clinically relevant reduction in pain (We considered 30 percent as a minimum. We selected a target of 30 percent based on published recommendations that propose that reductions in chronic pain intensity of at least 30 percent reflect moderate clinically important differences.68)
  • Sample size
    • Considering a null hypothesis of effect size of 30 percent, a study would need 176 subjects per group; a total sample size of 352 would be the smallest acceptable.
    • Considering a null hypothesis of effect size of 50 percent, a study would need 64 subjects per study group; a total sample size of 128 would be the smallest acceptable.

Therefore, a single study, with 100 percent of participants with noncyclic CPP, with no loss to follow-up, with a pain reduction in the placebo group of 30 percent, and a pain reduction of at least 60 percent in the intervention group would require a sample size of 350 patients.

Rather than choose a sample size of 350, we set a conservative lower limit for sample size at 50, to account for potential meta-analyses aggregating smaller trials at sufficient power to produce a confidence interval that excludes 1. Studies in the chronic pelvic pain realm rarely have identical patient populations or identical interventions, or identical outcome measures; hence the heterogeneity across studies would be problematic, and it would be important to have studies of sufficient size.

To examine the effects of our sample size requirement of at least 50 participants with CPP, we re-reviewed the randomized controlled trials that were excluded from the review and had fewer than 50 participants with CPP. Most studies were also excluded on another basis as well. Of those studies with an N of less than 50 that otherwise would have met the inclusion criteria at the full-text phase, none matched another in population, comparators, or interventions. None of these small studies used the same intervention; there was significant heterogeneity in the population and in the outcomes reported. Therefore, it would not have been possible to combine any two or more of these small studies and perform a meta-analysis as part of the systematic review. Moreover, these small studies, all addressing different interventions, would not have provided substantive data for the review.

We did not address harms of surgical interventions in this review as we felt that the studies meeting our inclusion criteria would necessarily provide desultory evidence of harms of surgical interventions. Most of the surgical interventions used for CPP are deployed in a broader context for other indications; a systematic review of the harms of the procedures would require a different and much larger search than the current review assignment and protocol, and KQs dictated. Reporting only the harms represented in the select studies meeting our criteria for addressing surgical intervention for CPP would present only a partial picture of potential harms of surgery.

Study Design

We accepted study designs including controlled trials and prospective cohort studies addressing the effectiveness of surgical or nonsurgical approaches (KQ2, KQ4), outcomes if an etiology for CPP is identified (KQ3), or effectiveness of one intervention over another to treat persistent CPP (KQ5). We considered prospective cohort studies to be comparative studies, in which separate groups of participants received different interventions. Prospective cohort study designs could use contemporaneous controls or historic controls. We also accepted prospective or retrospective case series or cross- sectional studies with at least 100 participants with CPP and addressing the prevalence of comorbidities of interest (KQ1) or harms of nonsurgical therapies (KQ4).

We selected the comorbidities of interest based upon reporting in the CPP literature. We extracted data regarding a study's use of validated tools to diagnose comorbidities or the provision of an operational definition for a comorbid condition. As described below, we factored the use of a validated tool into our quality assessment of studies providing data on the selected comorbidities.

Language

To gauge the relevance of research published in other languages, we located non-English literature for the time period of interest using our MEDLINE search strategy and identified 168 citations. Twenty-nine of these citations appeared potentially relevant on a title scan. We reviewed the abstracts of 28 of these, and none met our review criteria. We believed that the one study for which we could not locate an abstract would not substantially alter the findings of the review and excluded non-English studies.

In addition, we excluded studies that:

  • addressed pelvic pain related to cancer or pregnancy as the etiology of and treatment for these entities is significantly different from CPP related to other or unknown causes;
  • did not report information pertinent to the KQs;
  • were published prior to the year 1990 and the widespread use of laparoscopic techniques and introduction of medications such as serotonin reuptake inhibitors used to treat CPP; and
  • were not original research.

Screening of Studies

Once we identified articles through the electronic database searches, review articles, and bibliographies, we examined abstracts of articles to determine whether studies met our criteria. Two reviewers separately evaluated each abstract for inclusion or exclusion, using an Abstract Review Form (Appendix D). If one reviewer concluded that the article could be eligible for the review based on the abstract, we retained it for full text assessment.

Two reviewers independently assessed the full text of each included study using a standardized form (Appendix D) that included questions stemming from our inclusion/exclusion criteria. Disagreements between reviewers were resolved by a third-party adjudicator. The group of abstract and full text reviewers included expert clinicians (JA, SR, AY, FL) and health services researchers (RJ, NS).

Data Extraction and Data Management

The staff members and clinical experts who conducted this review jointly developed the evidence tables, which were used to extract data from the studies. We designed the tables to provide sufficient information to enable readers to understand the studies, including issues of study design, descriptions of the study populations (for applicability), description of the intervention, and baseline and outcome data on constructs of interest. Our outcomes of interest included:

  • Pain status (reduction in pain, pain recurrence, subsequent intervention for unresolved or worsening pain);
  • Functional status (activities of daily living, sexual functioning);
  • Quality of life;
  • Patient satisfaction with pain management; and
  • Harms or adverse effects of nonsurgical interventions.

The team abstracted several articles into evidence tables and then discussed the utility of the table design as a group. We repeated this process through several iterations until we decided that the tables included the appropriate categories for gathering the information contained in the articles. All team members shared the task of initially entering information into the evidence tables. Another member of the team also reviewed the articles and edited all initial table entries for accuracy, completeness, and consistency. The full research team met regularly during the article extraction period and discussed global issues related to the data extraction process.

Where available, we also captured data on potential risk factors related to CPP or conditions thought to occur commonly with CPP. These data included:

  • History of sexual or physical abuse;
  • History of pelvic surgery;
  • Pregnancy-related risk factors (e.g., history of Caesarean births, vaginal births, operative vaginal birth, genital tract trauma, pregnancy termination); and
  • History of comorbidities of interest (anxiety, depression, dysmenorrhea, fibromyalgia, headache, irritable bowel syndrome (IBS), interstitial cystitis/painful bladder syndrome (IC/PBS), low back pain, and sexual dysfunction).

This list of comorbidities represents conditions thought to occur frequently with CPP and was determined in consultation with our TEP.

The final evidence tables are presented in their entirety in Appendix C. Studies are presented in the evidence tables alphabetically by the last name of the first author within each year. When possible to identify, analyses resulting from the same study were grouped into a single evidence table.

Individual Study Quality Assessment

We used a components approach to assessing the quality of individual studies, following methods outlined in the EPC's Methods Guide for Effectiveness and Comparative Effectiveness Reviews.69 Decision rules regarding application of the tools were developed a priori by the research team. We developed separate quality assessment approaches for randomized controlled trials (RCTs), observational studies, and studies addressing the prevalence of comorbidities. Two reviewers independently assessed each study, with disagreements between assessors resolved via a third adjudicator.

We assessed each domain described below individually and integrated them for an overall quality level as described in the Determining Quality Levels section. We assessed studies as having “met” or “not met” a criterion; where relevant, criteria could also be judged as not applicable (NA) to a study. For the final integration of the assessment of quality, 3 levels were possible: good, fair, and poor.

We describe the individual quality components below and report individual quality assessments for each study in Appendix E.

RCTs

We assessed quality factors recommended in the Evidence Based Practice Centers' (EPCs) Methods Guide for Effectiveness and Comparative Effectiveness Reviews and in the Cochrane Handbook.

Sequence generation. We assessed study randomization by considering the following questions:

  1. Was the assignment randomized?
  2. Was the method used to generate the sequence of randomization described and was it appropriate?

We considered the following elements in determining the appropriateness of a study's randomization methods: Were random techniques like computer-generated, sequentially numbered opaque envelope used? Were technically nonrandom techniques, like alternate days of the week used?

Scoring. Studies providing a description of a truly random technique were assessed as “met” for this element.

Blinding. We considered four elements to assess blinding:

  1. Was the allocation to study groups (and interventions) adequately concealed from patients/ participants?
  2. Was the allocation to study groups (and interventions) adequately concealed from investigators?
  3. Was the allocation to study groups (and interventions) adequately concealed from clinical providers/caregivers?
  4. Was the allocation to study groups (and interventions) adequately concealed from outcome assessors?

Scoring. We defined adequate concealment as reasonable attempts (e.g., non-investigators involved in allocation, appropriate sham treatments used, etc.) by investigators to conceal intervention allocation groups. We assessed these criteria as met if the study provided such evidence of blinding.

Incomplete outcome data addressed. We considered four elements to assess the completeness of outcomes data reporting:

  1. Was complete information about participant flow provided, such as CONSORT diagram or equivalent information (numbers at random assignment; numbers receiving intended intervention; numbers completing protocol; and numbers analyzed for primary outcome, drop-out, lost to followup)?
  2. Was an intention-to-treat analysis (as assigned conducted and reported) performed appropriately?
  3. Were incomplete/missing outcome data adequately reported?
  4. Were missing outcome data managed by an accepted method?

Scoring. We considered acceptable methods of missing data management as either last observation carried forward; mean/median imputation; worst outcome imputation; or longitudinal regression imputation.

Selective outcome reporting. We assessed this domain using a single question: Was the primary outcome planned and described in the Methods section?

Scoring. Studies describing an a priori primary outcome determination were assessed as meeting this criterion.

Other bias. We assessed whether the study was largely free of other bias by considering the following elements: Was the trial stopped early for benefit? Was there an extreme baseline imbalance? Was there a substantive conflict of interest which posed a substantive, important threat to validity of the results?

Scoring. We scored studies as meeting this criterion if there was no evidence of such biases.

Sample size and power. We assessed this domain be determining whether an a priori sample size calculation was provided for the primary outcome.

Scoring. We scored studies as meeting this criterion if evidence of a sample size calculation was provided.

Statistical analysis. We considered the suitability of a study's analysis using the following questions:

  1. Was statistical analysis appropriate for the study design performed?
  2. Were the statistical results reliable?

Scoring. We scored studies as having “met” these criteria if our judgment was that the statistical analysis and results were appropriate and reliable for the stated study design and outcome. A glaring inconsistency or statistical error would result in a score of “not met.”

Dropout proportion. We evaluated studies for this domain using the question: What proportion of enrolled participants assigned to an intervention declined to continue the assigned intervention?

Scoring. We considered studies with a dropout rate of less than or equal to 10 percent as having “met” this criteria. We assessed studies with a greater than 10 percent or unreported rate as having “not met” the criterion.

Follow-up. We assessed the adequacy of follow-up by determining what proportion of enrolled population was present or accessible at the time of the primary followup.

Scoring. We considered studies with a rate of less than or equal to 20 percent loss as having “met” this criterion. Studies with greater than 20 percent loss or not reporting the percentage were scored as having “not met” this criterion.

Observational Studies

For observational studies we considered these domains: (1) the selection of the study groups; (2) the comparability of the study groups; and (3) the ascertainment and measurement of either the exposure/intervention or outcome of interest for case-control or cohort studies respectively; (4) avoidance of detection bias; and (5) methods for limiting bias and confounding.

For example, for a cohort study, the fundamental criteria included: representativeness of cohort, selection of nonexposed cohort, ascertainment of exposure, outcome of interest, comparability of cohorts, assessment of outcome, adequate duration of follow-up, and adequate follow-up of cohort. Other sources of bias would include baseline imbalances, source of funding, early stopping for benefit, and appropriateness of crossover design.

Selection of participants in study groups. We considered three elements to evaluate a study's risk of bias in the selection of study group participants:

  1. Were the characteristics of the participants/patients included in the study groups clearly described?
  2. Were the inclusion and/or exclusion criteria described?
  3. Were the criteria applied equally to all groups?

Scoring. We scored studies as having “met” this criterion if related data were provided.

Comparability of the study groups. We used the following questions to assess this domain:

  1. Was there an assessment of baseline comparability, with regard to confounders (disease status, risk factors, prognostic factors, case-mix adjustment) for the most important factors (attempts to balance the groups by design), and did this demonstrate comparability?
  2. Were concurrent controls used?

Scoring. We scored studies as having met these criteria if related data were provided.

Intervention description. We used the following questions to assess this domain:

  1. Was there a clear definition of the intervention?
  2. Was the measurement method of the intervention standard, valid, and reliable?

We considered the following elements in making a determination about these questions: Did all participants receive the same intervention? Were the interventions performed by the same person? Was the intervention measured equally in all study groups?

Scoring. Studies could be assessed as having “met” or “not met” these criteria. For question 2, we scored studies of pharmacologic interventions as NA.

Outcomes. We evaluated a study's measurement of outcomes using the questions:

  1. Was the method of outcome assessment standard, valid, and reliable?
  2. Was the follow-up duration long enough (≥12 weeks) for the outcomes to occur?

We considered whether references for measurement instruments were provided and whether authors indicated testing of an instrument in making determinations about these questions.

Scoring. Studies could be assessed as having “met” or “not met” these criteria.

Avoidance of detection bias. We used the following questions to assess avoidance of detection bias:

  1. Were the outcome assessors blind to the intervention/outcome status?
  2. If assessors were blinded, was concealment adequate?

Scoring. Studies could be assessed as having “met” or “not met” these criteria.

Outcome data reporting. We judged the quality of studies' outcome reporting using the two questions below.

  1. Were incomplete/missing outcome data adequately reported?
  2. Were the data managed by an accepted method?

We considered acceptable data management methods as last observation carried forward; mean/median imputation; worst outcome imputation; or longitudinal regression imputation.

Scoring. Studies could be assessed as having “met” or “not met” these criteria.

Selective outcome reporting. To assess this factor, we considered whether a primary outcome was planned and described in a study's methods.

Scoring. Studies could be assessed as having “met” or “not met” these criteria.

Other bias. We evaluated a study's handling of potential biases using the questions:

  1. Were methods appropriate for dealing with any design-specific issues such as recall bias, interviewer bias, etc.?
  2. Was there a substantive conflict of interest which posed an important threat to validity of the results?

We considered factors such as unclear reporting of findings in industry-sponsored trials and reporting interim versus final data (e.g., reporting only 6 week data in a completed 12 week study) as examples of substantive reasons for other bias.

Scoring. Studies could be assessed as having “met” or “not met” these criteria.

Sample size and power. We considered whether an a priori power calculation was provided for the primary outcome in assessing this element.

Scoring. We assessed studies as having “met” or “not met” these criteria.

Statistical analysis. We used the following questions to assess a study's statistical approach and scored studies as having “met” these criteria if our judgment was that the statistical analysis and results were appropriate and reliable for the stated study design and outcome. A glaring inconsistency or statistical error would result in a score of “not met.”

  1. Was a statistical analysis performed that was appropriate for the study design?
  2. Were the statistical results reliable?

Scoring. We assessed studies as having “met” or “not met” these criteria.

Dropout proportion. We evaluated studies in this domain using the question: What proportion of enrolled participants assigned to an intervention (medication, cognitive behavioral therapy, etc.) declined to continue the assigned intervention? We considered the following factors in making this determination: Does the paper describe a comparison between dropouts and the whole group? Were the reasons for dropout or withdrawal reported? Were incomplete outcome data adequately addressed?

Scoring. We considered studies with a dropout rate of less than or equal to 10 percent as having “met” this criteria. We assessed studies with a greater than 10 percent or unreported rate as having “not met” the criterion.

Followup. We assessed the adequacy of follow-up with the question, what proportion of enrolled population was present or accessible at the time of the primary followup and evaluated the following factors in making a determination: Was loss to followup uneven across exposure groups; Did the study fail to report the number of participants available at followup; Were incomplete outcome data adequately addressed?

Scoring. We considered studies with no more than a 20 percent loss as having “met” this criterion. Studies with greater than 20 percent loss or not reporting the percentage were scored as having not “met” this criterion.

Confounding and effect modifiers. We evaluated observational studies for this domain using the following four questions:

  1. For observational studies, was the approach to identifying confounding factors described?
  2. Was there adequate adjustment for the potential confounding factors?
  3. For observational studies, was the approach to identifying effect modifiers described?
  4. Was there adequate reporting of potential effect modifiers?

We defined potentially confounding variables as having an effect on the outcome and associated with the intervention/exposure, but not on the causal pathway under study. A confounder may therefore bias the estimation of the effect of intervention/exposure on outcome if unmeasured. We considered effect modifiers to be factors that modify the effect of the putative causal factor(s) under study, by having an effect on the outcome by altering the relationship between an independent variable and a dependent variable (outcome). The effect modifier neither explains nor obscures the relationship between the causal factor of interest and the outcome—instead it alters the relationship so that under differing conditions of the effect modifier, the relationship between intervention and outcome changes in magnitude or direction.

We also considered the following elements in assessing these variables: Was the candidate variable selection discussed/noted? Was the model-building approach described? How were continuous variables handled in models? Was there restriction in design or techniques (e.g., modeling; stratified, regression, or sensitivity analyses) to correct, control, or adjust for confounding factors?

Scoring. We assessed studies as having utilized appropriate (+) or inappropriate (-) approaches.

Studies Addressing the Prevalence of Comorbidities of Interest

We assessed factors including a study's sampling method description and adequacy, sample size, response rate, specification of inclusion criteria, reporting of the age of the study population, and use of validated diagnostic criteria or operational definition of diagnosis. We assessed studies for each of the following criteria and assigned a plus if the criterion was met and a minus if not:

  • Sampling method: The best sampling technique is random sampling, whereby a group of people are selected at random for study from a larger group (population). Each person is chosen entirely by chance, thereby reducing the likelihood of a selection bias favoring one group of people over another. Studies meeting our inclusion criteria for this question were largely intervention studies that also reported the prevalence of one or more comorbidities. If the sampling method was described, we assessed this criterion as met.
  • Sample size: The larger the sample, the narrower will be the confidence interval around the prevalence estimate, making the results more precise. We required that studies include at least 100 participants to be assessed as having met this criterion.
  • Response rate. Selection bias can occur if only a proportion of invited individuals participate in a survey. We set a minimum response rate of 70 percent in order for a study to be rated as having met this criterion.
  • Inclusion and exclusion criteria. Specifying inclusion criteria allows for comparability between different prevalence data reports. Criteria should comprise information about the age range and, if appropriate, gender and ethnic group of the targeted individuals. If the inclusion and exclusion criteria were specified, then we assessed this criterion as met.

We used the above 4 criteria to establish a baseline score of zero to 4. We further considered whether a study used validated diagnostic criteria to assess comorbidities of interest or provided an operational definition for a given comorbidity:

  • Validated diagnostic criteria or operational definition. We sought studies that addressed the prevalence of comorbidities report using validated diagnostic criteria (i.e., reference provided or discussion of testing of instrument provided), if such criteria existed when the study was conducted. If no validated criteria existed for a given comorbidity (e.g., vulvodynia, dysmenorrhea, fibromyalgia, IC/PBS, complex regional pain syndrome, functional abdominal pain syndrome, low back pain, and headache) or a validated tool was not used, we required that studies report an operational definition to meet this criterion. We considered operational definitions broadly as statements explaining how the investigators defined the comorbidity (e.g., an explanation of dysmenorrhea as painful menstrual periods). We scored each comorbidity reported in a study separately, assigning 2 points if the comorbidity was diagnosed with validated criteria, 1 point if the study provided an operational definition for the criteria, and zero points if no explanation of the criteria for the comorbid diagnosis was provided.

Determining Quality Levels

For RCTs, we considered a “good” study as one that met all criteria. We considered studies that were assessed as not meeting a factor in 3 or more domains (e.g., sequence generation, sample size and power, etc.) as poor quality. Studies not meeting criteria in one or two domains were considered fair quality.

For observational studies, we considered those meeting all criteria as good quality studies; those assessed as not meeting criteria in one to 4 domains as fair quality; and those not meeting criteria in 5 or more domains as poor quality.

For studies addressing the prevalence of comorbidities the minimum possible score was zero, and the maximum possible score was 6. In practice, the lowest score was 3 and the highest score was 6.

We considered studies achieving 6 total points as good quality; those receiving 5 points as fair quality; and those receiving 4 or fewer points as poor quality. Table 2 provides more information about study quality levels.

Table Icon

Table 2

Description of study quality levels.

Data Synthesis

There was significant heterogeneity among studies reporting therapeutic results for women with CPP, including heterogeneity of population inclusion criteria, heterogeneity of intervention, and heterogeneity of outcome measures. Therefore, it was not appropriate to perform any meta-analysis.

Grading the Body of Evidence for Each Key Question

We evaluated the overall strength of the evidence for the primary outcomes using the approach to strength of evidence as described in the EPCs' Methods Guide for Effectiveness and Comparative Effectiveness Reviews.69 We assessed the strength of evidence for key outcomes identified by the clinical investigators to be most clinically important: pain status (reduction in pain, recurrence of pain), subsequent intervention for the unresolved or worsening pain; and functional status (resolution/improvement of functioning). Secondary outcomes included: patient satisfaction with pain management; quality of life; and harms or adverse events.

We examined the following 4 major domains: risk of bias (low, medium, high), consistency (inconsistency not present, inconsistency present, unknown or not applicable), directness (direct, indirect), and precision (precise, imprecise) (Table 3).

Table Icon

Table 3

Domains used to assess strength of evidence.

We assigned each key outcome for each comparison of interest an overall evidence grade based on the ratings for the individual domains. The overall strength of evidence could be graded as “high” (indicating high confidence that the evidence reflects the true effect and further research is very unlikely to change our confidence in the estimate of effect); “moderate” (indicating moderate confidence that the evidence reflects the true effect and further research may change our confidence in the estimate of effect and may change the estimate); “low” (indicating low confidence that the evidence reflects the true effect and further research is likely to change our confidence in the estimate of effect and is likely to change the estimate); or “insufficient” (indicating that evidence is either unavailable or does not permit estimation of an effect). When no studies were available for an outcome or comparison of interest, we assessed the evidence as insufficient. Two reviewers independently graded the body of evidence; disagreements were resolved through discussion or a third reviewer adjudication.

Peer Review and Public Commentary

Peer reviewers and AHRQ representatives reviewed a draft of this evidence report, and the draft report also was posted to the AHRQ Effective Health Care Web site for public comment. A document addressing the disposition of peer and public review comments we received will be posted to the AHRQ Effective Health Care web site within 3 months of posting the final report.

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.3M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...