This chapter describes the prospectively designed protocol that the University of Alberta Evidence-based Practice Center (UAEPC) used to synthesize the evidence on pain management interventions following hip fracture. The topic refinement process for developing the key questions is described. We outline the literature search strategy, the selection process for identifying relevant articles, the process for extracting data from eligible studies, the methods for assessing the methodological quality of individual studies and for rating the overall body of evidence, and our approach to data analysis and synthesis.

Topic Development

The UAEPC was commissioned to conduct a preliminary literature review to gauge the availability of evidence and to draft the key research questions for a full comparative effectiveness review (CER). In consultation with the Agency for Healthcare Research and Quality (AHRQ) and the Scientific Resource Center, a Technical Expert Panel (TEP) was invited to provide input in the development of the key questions and scope of the report. Initial questions were posted on the AHRQ Web site, and the public was invited to comment on these questions. After reviewing the public comments, the key questions were finalized and submitted to AHRQ for approval.

The TEP was subsequently invited to provide high-level content and methodological expertise throughout the development of the CER. The names of technical experts are available in Appendix A.

Search Strategy

The research librarian, in collaboration with the research team, developed and implemented search strategies designed to identify evidence relevant to the key questions (Appendix B).

For the questions on efficacy and effectiveness, we conducted comprehensive searches in the following electronic databases: AMED (Allied and Complementary Medicine); Global Health; International Pharmaceutical Abstracts; BIOSIS Previews; CINAHL (Cumulative Index to Nursing & Allied Health Literature); Academic Search Elite and Health Source: Nursing and Academic Edition; Cochrane Complementary Alternative Medicine and Pain Database; Cochrane Database of Systematic Reviews; Database of Abstracts of Reviews of Effects; EBM Reviews – Cochrane Central Register of Controlled Trials; Embase; Global Health Library; MEDLINE; Pascal; PeDRO (The Physical Therapy Evidence Database); ProQuest Dissertations and Theses–Full Text; Scopus; Web of Science. For the questions on adverse effects, in addition to the above databases, we also searched TOXLINE (Appendix B-1 to B-15).

In order to identify literature from symposia proceedings, we searched Conference Papers Index (1982 to 2010), OCLC PapersFirst (1993 to 2010), and ScienceDirect Tables of Contents for select journals (Appendix B). We also hand searched proceedings for the following associations: American Geriatric Society, American Physical Therapy Association, American Society of Regional Anesthesia and Pain Medicine, European Society of Regional Anesthesia, European Society of Anesthesiology, and International Anesthesia Research Society (Appendix B-16 to B-19).

Unpublished studies and studies in progress were identified by searches of clinical trials registers (ClinicalStudyResults.org; ClinicalTrials.gov; Current Controlled Trials; ICTRP Search Portal; IFPMA Clinical Trials Portal; UMIN-CTR Clinical Trials) (Appendix B-20 to B-25), by contacting experts in the field, and by contacting authors of relevant studies.

The reference lists of reviews and guidelines were reviewed to help identify potential studies for inclusion. Original studies that met the inclusion criteria for this review were searched for citing studies using Scopus Citation Tracker.

Search terms were selected by scanning search strategies of systematic reviews on similar topics and by examining index terms of potentially relevant studies. A combination of subject headings and text words were adapted for each electronic resource. This included terms for hip fracture (fracture* and (hip or intertrochanter* or petrochanter* or subtrochanter* or intracapsular or extracapsular or petrochant* or trochant* or hip or “femoral neck”)) and pain terms (pain* or heal or healing or therap* or recover* or "quality of life" or rehabilitat* or "drug therapy" or pharmacological or acupunct* or acupress* or traction or "electrical stimulation" or "passive motion" or morphine or acetaminophen or paracetamol or tylenol or anesth* or analges*). All searches were restricted to studies published from 1990. No language or study design restrictions were applied. The detailed search strategies for each database are presented in Appendix B. The original searches were conducted between July 9 and July 27, 2009. On May 6, 2010 and December 16, 2010, the searches were updated using the original search strategies in Ovid MEDLINE, Embase, Cochrane Central Register of Controlled Trials, PASCAL, CINAHL, Scopus, DARE and ClinicalTrials.gov.

Results from the literature searches were entered into Reference Manager 11.0.1 (Thomson Reuters, Carlsbad, CA).

Study Selection

The results of the electronic literature searches, hand searches, and expert nominated records were screened using a two-step process. We included studies published as full-text manuscripts, conference abstracts, or other grey literature with no language restrictions. Research published prior to 1990 was not considered based on the rationale that surgical procedures and medical care in North America (particularly as related to aggressive postsurgery mobilization) for this patient population has changed and the earlier research may not be applicable to current care.

Study selection was based on an a priori set of criteria for inclusion and exclusion of studies including study design, patient population, interventions, and outcome measures (Table 1). First, two reviewers independently screened the titles and abstracts (level I screening) to determine if an article met the broad inclusion/exclusion criteria for study design, population, and intervention. Each article was rated independently as: include, exclude or unclear. Records rated as “include” or “unclear” by at least one reviewer were advanced to level II screening. The full-text versions of all potentially relevant articles were retrieved for independent formal review by two reviewers, applying a priori eligibility criteria and using a standardized screening form that was developed and piloted by the review team. Discrepancies regarding inclusion/exclusion of a study were resolved through discussion and consensus or by third-party adjudication if consensus could not be reached. Reviewers were not masked to the study authors, institution, or journal.38

Table 1. Inclusion and exclusion criteria.

Table 1

Inclusion and exclusion criteria.

Assessment of Methodological Quality of Individual Studies

The risk of bias of the included trials was assessed using the Cochrane Collaboration’s Risk of Bias (RoB) tool39 for randomized controlled trials (RCTs) and nonrandomized controlled trials (nRCTs). The methodological quality of cohort and case-control studies was assessed using the Newcastle-Ottawa Scale (NOS)40 for cohort and case-control studies, respectively. Decision rules regarding application of the tools were developed a priori by the research team. For RCTs and nRCTs, we performed a domain-based risk of bias assessment according to the principles of the RoB tool. The domains were: (1) sequence generation (e.g., was the allocation sequence adequately generated?); (2) allocation concealment (e.g., was allocation adequately concealed?); (3) blinding of participants, personnel and outcome, assessors (e.g., was knowledge of the allocated intervention adequately prevented during the study?); (4) incomplete outcome data (e.g., were incomplete outcome data adequately addressed?); (5) selective outcome reporting (e.g., were reports of the study free of suggestion of selective outcome reporting?); and (6) other sources of bias (e.g., was the study apparently free of other problems that could put it at a high risk of bias?). Other sources of bias included baseline imbalances, source of funding, early stopping for benefit, appropriateness of crossover design. For cohort and case-control studies, the NOS uses a “star system” in which a study is judged on three broad perspectives: (1) the selection of the study groups; (2) the comparability of the groups; and (3) the ascertainment of either the exposure or outcome of interest for case-control or cohort studies, respectively.

Two reviewers in a four-person team (AMAS, MH, MK, KW) independently performed quality assessment of the included studies with disagreements resolved through discussion or third-party adjudication, as needed.

Data Extraction

Published data were independently double-extracted by members of the research team (AMAS, MH, MK, KW, SM). Standardized data extraction forms were developed in Microsoft Word (Microsoft Corporation, Redmond, WA; Appendix C). Data extraction forms were piloted with three studies41-43 and identified issues were resolved. We extracted data on the following: general study characteristics (e.g., study design); population characteristics (e.g., age, sex); interventions and dosing regimens; numbers of patients allocated into relevant treatment groups; outcomes measured, method of ascertainment, and the results of each outcome, including measures of variability, by relevant intervention arm. Funding source, if reported, was also recorded.

When there were multiple reports of the same study we referenced the primary or most relevant study, and extracted only additional data from companion reports. Corresponding authors were contacted for data clarification and missing data. All data were imported into Microsoft Excel (Microsoft Corporation, Redmond, WA) for data management.

Dichotomous data were extracted as the number (n) of participants with events and the total number of participants (N). Continuous outcomes were extracted as the mean with the accompanying measure of variance for each treatment group, or as a mean difference (MD) between treatments based on the method of outcome measurement (e.g., scale, score system). Continuous data were analyzed as post-treatment score or absolute difference (or change score) from baseline.44 Multiple scales and scoring systems were used to measure the outcomes (e.g., pain scores). Therefore, in addition to summary data and measure of variance, the scale and the type of analysis used in the study were extracted (Appendix C). For all outcomes (e.g., delirium, hypotension) we used the definitions as reported by the authors of individual studies.

When data were available only in a graphical format, data were extracted from the available graphs using the distance measurement tool in Adobe Acrobat 8 Professional (Adobe Systems Inc., San Jose, CA). When data were not available for the measure of variability for continuous outcomes, the variability was calculated from the computed p-value or, if not available, it was imputed from other studies in the same analysis. When relevant data for multiple followup/observation periods were reported, only the followup data for the reported period that demonstrated the greatest improvement for the intervention arm was extracted. When studies incorporated multiple relevant treatment arms, data from all were extracted. We noted the specific intervention, dosage and intervals of each intervention to determine if arms were clinically appropriate for pooling. For the purpose of this review, acute outcomes (mortality, acute pain, and delirium) occurred up to 30 days postfracture.

Data Analysis

Evidence tables and qualitative description of results are presented for all included studies. Where appropriate, we conducted meta-analyses to answer the key questions. Meta-analyses were performed in Review Manager 5.0 (The Cochrane Collaboration, Copenhagen, Denmark). For dichotomous outcomes, the Review Manager software allows pooling with one of the following statical methods: Mantel-Haenszel (MH), inverse variance (IV) or the Peto’s modified Mantel-Haenszel (Peto). For continous outcomes, pooling is performed using IV. Additionally, for the aforementioned methods both fixed-effects or random-effects models are available, except for Peto, which uses only a fixed-effect model. For the purpose of this review, we pooled binary data using the MH and a random-effects model (DerSimonian and Laird),45 except in instances where the percentage of participants with an event was less than one percent, in which case Peto’s odds ratio was calculated using a fixed-effects model.46 For continuous outcomes, we used the IV and a random-effects model (DerSimonian and Laird).45 Chi-square tests were used to test for significant heterogeneity reduction in partitioned subgroups. A chi-square test of p <0.1 was considered to be significant. Forest plots were generated and presented for the primary outcomes as long as at least two trials contributed to the synthesis. For secondary outcomes, forest plots were presented only if there were at least five included studies.

In the meta-analyses, RCTs and nRCTs were combined. Cohort studies were synthesized separately, as meta-analysis including both trials and cohort studies is controversial.47 For continuous summary estimates where the same measure of analysis was used the MD was calculated with 95 percent confidence intervals (CI). When different measures of analysis (e.g., different scales) were used, the standardized mean difference was used. Dichotomous summary estimates were reported as odds ratios with accompanying 95 percent CI.

Heterogeneity was tested using an I2 statistic,48 with an I2 value 75 percent or greater considered to be substantial, thereby precluding pooling of studies. In the case of substantial statistical heterogeneity, if there were at least 10 studies in the analysis, we proposed to explore heterogeneity through meta-regression, subgroup analyses, and sensitivity analyses. If the number of included studies was less than 10, we explored heterogeneity qualitatively through subgroup and sensitivity analyses. Effect modifiers that were considered important to explain heterogeneity included specific intervention details (e.g., type and quantity), study design, and risk of bias. In addition, we conducted sensitivity analyses on studies with imputed data to determine if the imputations had any effect on the effect estimate or heterogeneity. A priori subgroup analyses included sex, age, race, body mass index, marital status, comorbidities, prefracture functional ability, and family distress.

Almost one-fourth (22.1 percent) of the trials had multiple intervention arms comparing different doses or concentrations of the same intervention, or drugs of the same class. When appropriate, data from the available arms were pooled before being included in the meta-analysis. Dichotomous arms were pooled by simple addition, while pooling of continuous arms was performed using generic inverse variance.

Dichotomous data with zero values (i.e., no participant experienced an event) were not included in meta-analyses because summary trial results were not estimable, but the results from these studies were reported in the narrative synthesis for the relevant intervention.

Potential publication bias was explored graphically through funnel plots for comparisons for which meta-analyses were conducted and when there were at least 10 studies in the analysis. Additionally, if bias was suspected, publication bias was quantitatively assessed using the Begg adjusted rank correlation test and Egger regression asymmetry test.49

Applicability

Applicability of evidence distinguishes between effectiveness studies conducted in primary care or office-based settings that use less stringent eligibility criteria, assess health outcomes, and have longer followup periods than most efficacy studies.50 The results of effectiveness studies are more applicable to the spectrum of patients in the community, than efficacy studies, which usually involve highly selected populations. The applicability of the body of evidence was assessed following the PICOTS (population, intervention, comparator, outcomes, timing of outcome measurement, setting) format used to assess study characteristics. Clinically important outcomes and participant characteristics are reported in the results.

Rating the Body of Evidence

We evaluated the overall strength of the evidence for key outcomes. We used the AHRQ GRADE51 approach, which is based on the standard GRADE approach developed by the Grading of Recommendation Assessment, Development and Evaluation (GRADE) Working Group.52 The strength of evidence was assessed for outcomes identified by the clinical investigators to be most clinically important: acute pain, chronic pain, mortality (30-day), and the incidence of serious adverse effects (e.g., stroke, myocardial infarction, delirium, renal failure). The following four major domains were examined: risk of bias (low, medium, high), consistency (inconsistency not present, inconsistency present, unknown or not applicable), directness (direct, indirect), and precision (precise, imprecise).

Each key outcome on each comparison of interest was given an overall evidence grade based on the ratings for the individual domains. The overall strength of evidence was graded as “high” (indicating high confidence that the evidence reflects the true effect and further research is very unlikely to change our confidence in the estimate of effect); “moderate” (indicating moderate confidence that the evidence reflects the true effect and further research may change our confidence in the estimate of effect and may change the estimate); “low” (indicating low confidence that the evidence reflects the true effect and further research is likely to change our confidence in the estimate of effect and is likely to change the estimate); and “insufficient” (indicating that evidence is either unavailable or does not permit estimation of an effect). When no studies were available for an outcome or comparison of interest, the evidence was graded as insufficient. A detailed explanation of the parameters used to grade the evidence and their operationalization are summarized in Appendix J. The GRADEprofiler (GRADEpro), software (GRADE Working Group) was used and the results modified in accordance with the AHRQ GRADE model. The body of evidence was graded independently by two reviewers (AMAS, DD); disagreements were resolved through discussion.

Peer Review

Ten experts in the field (Appendix A) agreed to peer review the draft report and provide comments. Reviewer comments were considered by the UAEPC in preparation of the final report. All peer reviewer comments and the UAEPC disposition of comments were submitted to AHRQ for assessment and approval.