PubMed Health. A service of the National Library of Medicine, National Institutes of Health.

LeBlanc E, O'Connor E, Whitlock EP, et al. Screening for and Management of Obesity and Overweight in Adults [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2011 Oct. (Evidence Syntheses, No. 89.)

Appendix ADetailed Methods

Study Selection

Two investigators independently reviewed all abstracts and articles against inclusion and exclusion criteria. Discrepancies were resolved by consensus. Articles excluded for not meeting inclusion criteria or for poor quality are listed in Appendix D Tables 14. Inclusion and exclusion criteria are detailed in Appendix B Table 1, and are summarized here.

Study design. We included only English-language, randomized or controlled clinical trials evaluating the effectiveness and safety of weight loss interventions in adults. Large cohort studies or case-control studies reporting serious adverse effects related to weight loss interventions were included to assess harms only (key question [KQ] 4 only). All trials had to include a true control group that received no intervention. More specifically, an acceptable control group could not receive a personalized intervention, at-home workbook materials, advice more frequently than annually, or participate in frequent weigh-ins (less than every 3 months). A healthy lifestyle message was considered too similar to weight loss messages for attention control groups.

Population and setting. We included trials conducted among adults (ages ≥18 years) who were obese or overweight. Populations must either have been unselected, selected for low cardiovascular disease risk, or selected for increased risk for specified conditions (cardiovascular disease, hypertension, dyslipidemia, or type 2 diabetes). Trials limited to participants with cardiovascular disease were not included, though trials could include some participants with cardiovascular disease. We included trials conducted in settings generalizable to U.S. primary care, feasible for conducting in primary care, feasible for referral from primary care, or conducted in commercial settings (e.g., Weight Watchers). We excluded trials conducted in hospitals, institutionalized settings, school-based programs, occupational settings, churches, and other settings deemed not generalizable to primary care, such as those with existing social networks among participants or the ability to offer intervention elements that could not be replicated in a health care setting.

Intervention. We included only interventions focusing on weight loss, including behavioral-based, pharmacological (orlistat and metformin), or a combination of both. We excluded behavioral interventions that did not focus primarily on weight or that did not report weight-related outcomes, surgical interventions, primary prevention programs that did not involve a weight loss goal for all participants, and trials focusing on pharmacological agents other than orlistat or metformin.

Outcomes. We included multiple health outcomes: decreased morbidity from diabetes mellitus, cardiovascular disease, cancer, arthritis, asthma, and sleep apnea; improved depression; improved emotional function (scores on emotional subscales of quality of life instruments); physical fitness capacity or performance (not behavioral); physical functioning (scores on physical subscales of quality of life measures); disability (global measures of disability, such as activities of daily living); and mortality. Intermediate outcomes included a reduction of weight or adiposity (a required outcome). Acceptable measures included weight, relative weight, total adiposity measures, or change in any of these measures. Other intermediate outcomes included weight maintenance after an intervention has ended and metabolic consequences (e.g., glucose tolerance, blood pressure, dyslipidemia). Adverse outcomes included serious treatment-related harms at any time point after an intervention began (e.g., death, medical issue requiring hospitalization or urgent medical treatment) or other treatment-related harms reported in trials. Outcomes reported more than 12 months after the start of the intervention were included. Trials of treatment-related harms had no minimum followup requirement.

Data Extraction and Quality Assessment

Two independent investigators dual-reviewed 5,869 abstracts and 623 articles (Appendix B Figure 1) for inclusion and critically appraised all included articles using design-specific criteria (Appendix B Table 2) and USPSTF methods.125 The USPSTF has defined a three-category quality rating of “good,” “fair,” and “poor” based on specific criteria. Discrepancies in quality ratings were resolved by consultation with a third investigator. All studies rated as poor quality were excluded from the review.

Briefly, for KQs 1–3, we assessed the validity of the randomization and measurement procedures, attrition, similarities between the groups in baseline characteristics and attrition, intervention fidelity, and statistical methods. Among other things, good-quality trials blinded staff members to the participants' treatment assignments (or future treatment assignment) if they performed tasks related to assessment or randomization, had followup data on 90 percent or more of participants, reported group-specific followup with less than 10 percentage points difference between groups, and described important details related to the measurement of anthropomorphic measures, such as how participants were dressed, what type of scale was used, how they determined where to measure waist circumference, or how many times blood pressure measures were taken and how they were combined. Trials were rated as “poor” if attrition in the treatment and control groups differed by more than 20 percentage points or if overall attrition was higher than 40 percent, or had other important flaws. If a study was conducted for more than 12 months, only data from time points with adequate followup were included. For example, if the study's attrition met our standards at 12 months but not at 24 months, only 12-month data was abstracted. However, we made an exception to this rule for outcomes that were reported as cumulative incidence. For example, we did not abstract 24-month weight or blood pressure data from a study that had low attrition at 24 months; however, we did abstract the incidence of diabetes during the entire study period if it was reported as cumulative incidence and the attrition at 12 months was not higher than our quality criteria.202 All trials meeting quality criteria for KQs 1–3 were also examined for KQ 4 outcomes.

In addition, we developed separate quality assessment procedures for trials that were not included for KQs 1–3 (either due to quality issues or other inclusion criteria) but reported harms outcomes, so some trials that were excluded from KQs 1–3 for poor quality were included for KQ 4. The quality rating of KQ 4-only studies focused specifically on the assessment and analysis of harms (and not other outcomes). In addition, we did not have minimum attrition standards, both because harms of treatment could appear at any time after treatment began and because we were concerned that if medications had high rates of adverse events, attrition could be very high, and only a very selected sample would be evaluated for harms if we maintained the same attrition standards. We only examined harms outcomes that were cumulative (i.e., percent withdrawing from the trial due to adverse effects, percent experiencing any serious adverse effect, percent experiencing any adverse effect, and percent experiencing any gastrointestinal adverse effects) in these trials that did not meet the attrition standards of KQs 1–3. Because we had different standards for KQ 4 that focused only on factors specifically related to the assessment of harms, we did not distinguish between “good” and “fair” trials, but simply rated them as “acceptable” or “poor.” A poor-quality study was one that had a fatal flaw that made the harms data of questionable validity.

One investigator abstracted data from included studies into evidence tables and a second investigator reviewed abstracted data for accuracy. We abstracted prespecified study details into evidence tables that included the following items: study design; setting (location, target population, recruitment strategy); population characteristics (study inclusion and exclusion criteria, participant age, sex, race/ethnicity, and socioeconomic status, as defined by income or education); baseline health status (body mass index; percent with diabetes, hypertension, and dyslipidemia); intervention characteristics (aim/theory, intervention/control description, duration, incentives, and who administered the intervention); outcomes; and adverse events. Relevant outcomes for abstraction included anthropomorphic measures (weight/relative weight, central adiposity, overall adiposity), intermediate outcomes (lipids, glucose tolerance, blood pressure), and health outcomes (depression, decreased morbidity, physical fitness capacity, mortality). Complete evidence tables are included in Appendix C Tables 13.

For KQs 1–3, this review included 140 articles representing 61 unique trials, 27 of which were conducted in the United States.

In addition to evaluating the studies from KQs 1–3 for harms, we abstracted harms data from 25 additional weight loss studies (table of harms data studies not in main analysis). These studies were not included in KQs 1–3 for various reasons, including poor quality, short duration (<12 months), or not a qualified methodology (not a controlled trial). For KQ 4, this review included 167 articles representing 85 unique trials.

Data Synthesis and Analysis

We separately synthesized evidence for trials of weight loss medications and trials of behavioral-based interventions. Behavioral and medication trials were combined in a single forest plot for each outcome, but results were pooled separately for the behavioral trials, and each medication was synthesized separately given their different mechanisms of action. Within each intervention type, trials were grouped according to the risk status of the study samples, and then ordered by the intensity of the behavioral interventions within each risk status. We grouped the trials according to risk status as follows: 1) trials limited to people with known risk factors related to cardiovascular disease (operationalized as hypertension, diabetes, or dyslipidemia and termed “CV risk” trials); 2) trials limited to those with elevated risk but without known disease (prehypertension, impaired glucose tolerance or elevated fasting glucose, borderline high total cholesterol, low-density lipoprotein, or triglyceride levels, low high-density lipoprotein levels, or abdominal obesity; termed “subclinical” trials); and 3) trials that either did not limit samples on the basis of cardiovascular risk or that excluded people with the risk factors described above (termed “unselected/low risk” trials).

We captured the intensity of the behavioral interventions differently in behavioral-based and medication trials. For behavioral-based interventions, we usually had enough detail to estimate the number of sessions offered in the first year of the intervention, and used this continuous variable as our indicator of intensity in the forest plots. Medication trials typically provided limited detail about the behavioral interventions they offered as adjuncts to medication management, but we were able to identify two levels of intensity: brief intervention only, comparable with what might be offered in primary care (labeled “LO” in the forest plots and referred to as “brief” in the text), and more intensive than would likely be offered in primary care (labeled “HI” in the forest plots and referred to as “intensive” in the text). Trials that had insufficient detail to determine intensity were labeled “NR” (not reported) in the forest plots. The “brief” interventions did not require participants to attend a specific session on diet. These three studies offered handouts and regular visits with a physician while subjects received the medication. The “intensive” counseling interventions generally involved regular (generally four to 12 sessions over 12 months) contact with a dietitian or counselor, most often with monthly medication monitoring and weigh-ins. Only one of the trials with 12 or more sessions explicitly reported discussing behavioral management principles with participants, but most of the trials with only four sessions did report providing some instruction in behavior management principles. Thus, although 12 sessions is considerably more than four, we did not feel that the 12-session interventions could necessarily be described as more intensive than the four-session interventions that included behavioral management, so we decided to group them together under the label “intensive” (or “HI” in the forest plots).

We conducted random effects meta-analyses to estimate the effect size of weight loss interventions on intermediate health outcomes (adiposity, systolic and diastolic blood pressure, total cholesterol, high- and low-density lipoprotein cholesterol, triglycerides, and glucose). For continuous outcomes, we analyzed change in outcome from baseline. Risk ratios were analyzed for dichotomous outcomes. Absolute risk difference was also estimated through meta-analysis in many cases so the number needed to treat could be calculated. We selected a single intervention arm for trials that included multiple active treatment arms and calculated change from baseline and standard deviations based on the information provided in the individual articles if they were not provided. We converted measurements into common units using standard conversion factors, which are provided below.

We assessed the presence of statistical heterogeneity among studies using standard chi-square tests and the magnitude of heterogeneity was estimated using the I2 statistic.139 We considered an I2 of <50 percent to represent low heterogeneity, 50 to 75 percent to represent moderate heterogeneity, and >75 percent to indicate high heterogeneity among studies. Tests of publication bias on whether the distribution of the effect sizes was symmetric with respect to the precision measure were performed using funnel plots and Egger's linear regression method,140 when the number of studies was about 10 or more.141

Meta-regression was used to explore heterogeneity in effect sizes among the KQs 1–3 trials. Due to concerns about type I errors, we limited most exploration of heterogeneity to a single outcome of weight loss. Some factors were explored for the entire body of trials, combining behavioral and all three medication types. Some factors were run separately for the medication trials only and the behavioral trials only. Continuous variables were left as continuous variables, and categorical variables were converted to one or more dummy variables.

A prominent source of clinical heterogeneity was population risk status. Thus, we created two dummy variables, using the unselected/low-risk category as the reference group, and included these variables in all meta-regression models. All regression models involving the full set of KQs 1–3 trials also included a variable to indicate whether the trial was a medication or behavioral-based intervention trial.

Another factor we explored was the participant identification approach. Trials that identified specific potentially eligible patients prior to recruitment and used individual outreach and screening for recruitment (referred to as “study-identified”) were contrasted with trials that used broad-based media approaches that required potential participants to contact study staff in order to be screened for study eligibility (referred to as “self-identified”). Trials that did not report enough detail to determine recruitment approach were assumed to be self-identified. Additional factors explored for the entire combined body of literature were: percent of participants retained at 12 to 18 months, whether the trials focused on weight maintenance as opposed to weight loss, whether primary care was the setting for either recruitment or the intervention, whether the trial was set in the United States, study quality rating (on a subjective scale of 1–4, where 1=barely acceptable and 4=good), and selected patient-level characteristics (average age, percent female, percent nonwhite, and baseline body mass index).

For behavioral trials, we also examined the number of sessions in the first year and, in separate models, the presence of each of the following intervention components: supervised physical activity sessions, group sessions, individual sessions, technology-based assessment or intervention, specific weight loss goal, spouse or family involvement, barriers to weight loss addressed, pros and cons of weight loss or similar motivational assessment, self-monitoring expected, use of incentives for weight loss or intervention participation, and support for weight loss or lifestyle maintenance after active intervention phase. The variables examined in the combined medication and behavioral trials were also examined separately in the behavioral subgroup. Number of sessions in the first year and patient risk status were included in all models.

Additional variables were explored for the medication and behavioral trials separately. For medication trials, we also examined the percent of participants that were retained after a run-in phase (scored as 100 if there was no run-in phase, and dropped from the analysis if a run-in phase was present but we could not determine the percent who dropped out), the specific type of medication, and whether the behavioral intervention was more intensive than would be delivered in primary care (see intensity definitions described above). The variables explored for the entire group of trials listed above were also examined separately in the medication trials. All meta-regression of the medication trials controlled for medication type and population risk status. All analyses were performed using Stata 10.0 (StataCorp, College Station, TX).

Meta-analysis decisions. Meta-analysis involves a number of decisions and calculations, and this document details the main decision rules we developed for data abstraction and analysis, and formulas used to calculate missing statistics.

Selecting intervention arm. For trials with multiple intervention arms, we selected the intervention that was most similar to other interventions included in the meta-analysis, if applicable (e.g., most orlistat trials used 120 mg daily dosage, so if a trial included treatment arms using 120 mg and another amount, we selected the arm that used 120 mg), or the most intensive arm. In one case, one treatment arm was diet-only and one arm was exercise-only, and we used the diet-only arm.

Selecting number of participants. If the study did not report some kind of data substitution for missing followup data (e.g., last observation carried forward) or an analysis that used all observations (e.g., random effects models, general estimating equations), then we used the number of participants with followup in each group, if available. If not available, we used the number of participants randomized. If the trial did report data substitution or analysis techniques such those described above, then we used the number randomized in each group, if they were not given specifically for each analysis. For adverse events (KQ 4), when only a proportion and not a number was provided, we assumed the denominator to be the total number randomized.

Baseline values. If a trial reported values at run-in (prior to randomization) and at randomization (post-run-in), we used the baseline values at randomization. If a trial only reported change from before run-in, we calculated changes from that point but did not enter standard deviations.

Followup time. If a study had a 12-month followup, we used that in the meta-analysis. If a trial did not have a 12-month followup, we accepted outcomes with up to 18 months of followup, preferentially selecting the closest to12 months if multiple followup times were reported.

For weight maintenance trials (and those with a weight loss requirement during run-in), we considered baseline to be the beginning of the weight maintenance phase (randomization, for those trials with weight loss run-in). For calculating the number of sessions, we counted the number of sessions in the weight maintenance phase only. For estimating followup time, we counted time to followup from the end of the weight-loss phase for the outcome of weight loss. When entering 5% or 10% weight loss in maintenance trials, we accepted whatever was reported by the trial, which in all cases was counted from the beginning of the initial weight-loss phase.

Calculations. If a trial reported results separately for subgroups, we combined the subgroup scores to calculate a single overall score for each intervention and control group participants. We used the following formulas to calculate combined means and standard deviations:292


We used standard calculations to convert standard errors and 95% confidence intervals to standard deviations:


If only baseline and followup values were reported, we calculated the crude mean change by subtracting the baseline mean from the followup mean for each group, and estimated the standard deviation using the following formula:

SDchange = Sqrt(SD2base + SD2post – 2 * SDbase * SDpost * rbase,post)

In order to use this formula, we estimated the correlation between baseline and followup for each outcome. To do this, we examined studies that reported mean change as well as baseline and followup means, and used the formula above to determine the correlations in their samples. These studies were quite variable in the resulting correlations, the time of followup, the quality of the study, and the number of estimates we were able to find. Because of this variability, both in quality of the estimate and the absolute value of the correlations, we grouped like outcomes and used what we believed to be reasonable, somewhat conservative (lower) values for that set of outcomes. The final correlations used are listed in Table 1.

Table 1. Estimated Correlation Between Baseline and Followup for Analyzed Outcomes, Used in Calculation of Change Score Standard Deviations.

Table 1

Estimated Correlation Between Baseline and Followup for Analyzed Outcomes, Used in Calculation of Change Score Standard Deviations.

Other analyses. When summary means were calculated for groups of trials (such as average age among all behavioral trials), mean values were weighted by the number of participants randomized in the relevant treatment arms of the trial.

Table 2. Conversion Factors.

Table 2

Conversion Factors.

Cover of Screening for and Management of Obesity and Overweight in Adults
Screening for and Management of Obesity and Overweight in Adults [Internet].
Evidence Syntheses, No. 89.
LeBlanc E, O'Connor E, Whitlock EP, et al.

PubMed Health Blog...

read all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...