Home > Methods - Childhood Obesity Prevention...

PubMed Health. A service of the National Library of Medicine, National Institutes of Health.

Wang Y, Wu Y, Wilson RF, et al. Childhood Obesity Prevention Programs: Comparative Effectiveness Review and Meta-Analysis [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Jun. (Comparative Effectiveness Reviews, No. 115.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.


The methods for this comparative effectiveness review follow the methods suggested in the Agency for Healthcare Research and Quality (AHRQ) “Methods Guide for Effectiveness and Comparative Effectiveness Reviews” (available at http://www.effectivehealthcare.ahrq.gov/methodsguide.cfm). The main sections in this chapter reflect the elements of the protocol established for the comparative effectiveness review; certain methods map to the PRISMA checklist.41We determined all methods and analyses a priori.

Topic Refinement and Protocol Review

We developed the Key Questions (KQs) with the input of a key informant panel, which included experts in childhood nutrition policy, academic clinicians treating obese children, representatives from public school systems, parents of obese children, representatives from professional societies focusing on nutrition and obesity, and staff from AHRQ and the Scientific Resources Center. AHRQ posted these KQs on its Web site for public comment in July 2011 for 4 weeks and revised as needed. The KQs focus on the comparisons of methods for prevention of obesity in children. We recruited a Technical Expert Panel, which included experts on childhood obesity, primary care, obesity policy, and nutrition. These technical experts provided high-level expertise to the Evidence-based Practice Center during our development of the protocol for the comparative effectiveness review. Additionally, the Effective Health Care Program posted the KQs on its website for public comment and we discussed the KQs with the Technical Expert Panel.

Key Definitions

Obesity and Overweight

Obesity is a medical condition in which excess body fat has accumulated to the extent that it may have an adverse effect on health. For children, obesity is defined based on age-sex-specific 95th body mass index (BMI) percentiles, while overweight, based on the 85th percentile. However, different studies might have used different BMI references, for example, some studies in European countries might use the 97th BMI percentile developed based on their country-specific data for obesity. Moreover, some studies may use other measures, such as the 90th percentiles of waist circumference (to define central obesity), skinfold thickness, and percentage of body fat. Note that until recently that the WHO and the US CDC ever recommended to use the term of “at risk of overweight” for “overweight” and use “overweight” for “obesity” in children and adolescents.19,22,23

Interventions for Prevention of Childhood Obesity

Our team came to a consensus on the definitions of the following settings and types of interventions in order to categorize the studies that we identified in our literature search. We grouped studies by the predominant setting of the intervention as we anticipated that this would best meet the needs of the users of this report.

School-Based Interventions

School-based interventions are those studies that are carried out primarily in schools. Such interventions might also involve parents, as well some activities at home (e.g., homework, students bringing home flyers).

Home-Based Interventions

Home-based interventions are those carried out in or through the child's home. For example, these may intervene to alter the foods purchased for home use or family fitness.

Primary Care-Based Interventions

Primary-care based interventions are those carried out in or through the offices of a primary care practitioner, a clinic, or other health care entity delivering primary health care to children. Note that we classify school-based health care as a school-based intervention. Primary care-based interventions, which include a health informatics component, are classified under primary-care interventions.

Childcare-Based Interventions

Child-care settings are those where children receive non-parental/non-custodian care, generally outside the home. We classify school-based after-care programs as school-based interventions. We classify childcare interventions delivered in other settings as childcare-based interventions.

Community-Based and Environment-Level Interventions

Community-based and environment-level interventions include those interventions that result from policy, legislative, built environment, and economic/pricing/food subsidy interventions. We classified school-based policies with the school-based interventions. Additionally, these interventions involve interaction with the community (a group of individuals who exist prior to the intervention and who share one or more common characteristics such as the YMCA, Church groups).36

Consumer Health Informatics-Based Interventions

Consumer Health Informatics encompasses technologies focused on indirect, as opposed to face-to-face, contact with patients as the primary users of health information. This includes Web-based, phone-based, and video-based programs, games, and information storehouses.

Search Strategy

We searched the following databases for primary studies: MEDLINE® via PubMed, Embase®, PsychInfo, CINAHL®, and the Cochrane Library through August 11, 2012. We did not add any date limits to the search: PubMed catalogues articles to 1966; The Cochrane Library catalogues articles to 1989; CINAHL catalogues articles to 1982; Embase catalogues articles to 1974. We developed a search strategy for MEDLINE, accessed via PubMed®, based on medical subject headings (MeSH®) terms and text words of key articles that we identified a priori. (Appendix B) We reviewed the reference lists of all included articles, relevant review articles, and related systematic reviews to identify articles that might have been missed by the database searches. We did not request Scientific Information Packets from any manufacturers as we were not studying any pharmaceuticals or devices.

We downloaded the results of the searches and imported them into ProCite® version 5 (ISI Research Soft, Carlsbad, Calif.). We scanned for exact article duplicates; author/title duplicates, and title duplicates using the duplication check feature in ProCite. We uploaded the articles from ProCite to DistillerSR (Evidence Partners, Ottawa, Ontario, Canada), a Web-based software package developed for systematic review and data management. We used this database to track the search results at the levels of title review, abstract review, article inclusion/exclusion, and data abstraction.

We conducted a grey literature search in ClinicalTrials.gov to identify unpublished research that was relevant to our review on July 23, 2012. The search strategies we used were comparable to those used in the MEDLINE search and are in Appendix B.

Study Selection

We aimed to identify studies describing the comparative effectiveness of interventions to prevent obesity (or excessive weight gain) in children and adolescents 2 to 18 years old, conducted in the United States or other countries with a very-high Human Development Index based on the United Nations' report.42 We included only randomized controlled trials and non-randomized trials, as we expected observational studies on this topic to be confounded and could not tested causality. We included only articles published in English, but reviewed the abstracts of non-English language articles to assess agreement with the results published in English. We did not exclude studies based on study size.

Studies were eligible for inclusion if they followed children for at least 1 year after the initiation of the intervention, or at least 6 months if it was a school-based intervention given the expectation that most studies would not observe children past the 9-month school-year (see Table 2).

Table 2. Inclusion and exclusion criteria.

Table 2

Inclusion and exclusion criteria.

The studies needed to compare results from any intervention targeting obesity prevention to results from usual care, or another different intervention, or no intervention. We also intended to include in this review studies that described results from natural experiments, such as those that described outcomes from a community that implemented a food policy change, compared to another community that did not. We did not include other observational studies, such as cross-sectional or cohort studies. We differentiated natural experiments from other observational study designs by specifying that a natural experiment was the implementation of a policy or similar intervention at a population level.

For inclusion in this review, we required that the study reported on the attained differences between the intervention and control groups in the prevalence of obesity or/and overweight, BMI or BMI distribution in the groups, or other weight and adiposity measures such as waist circumference, percentage of body fat, or skinfold thickness.

We excluded studies that targeted only at overweight or obese children or adolescents, and similarly excluded studies that targeted children on the basis of having a chronic medical condition like diabetes or heart disease. We excluded studies that expressly aimed to induce weight loss in the participants. We did not include studies that collected only qualitative results, such as from interviews or focus groups. We did not include studies published only in abstract form due to the sparseness of data in abstracts.

Trials identified in the grey literature search were required to meet the same inclusion criteria as studies identified in the regular searches.

Data Extraction

We used DistillerSR (Evidence Partners, 2010) to manage the screening and review process. We uploaded all applicable citations identified by the search strategies to the system.

Two independent reviewers conducted title scans. For a title to be eliminated at this level, both reviewers had to indicate that the study was ineligible. If the reviewers disagreed, they advanced the article to the next level, abstract review. Two investigators independently reviewed abstracts and we excluded the abstracts if both investigators agreed that they met one or more of the exclusion criteria. We tracked and resolved differences between investigators regarding abstract inclusion or exclusion through consensus adjudication. Articles promoted on the basis of abstract review received an independent parallel review to determine if we should include them in review. We resolved differences by consensus adjudication.

We created standardized forms for data extraction. (Appendix C) Each article received a double review by study investigators for data abstraction. The second reviewer confirmed the first reviewer's data abstraction for completeness and accuracy. We formed reviewer pairs that included personnel with both clinical and methodological expertise. A third reviewer audited a random sample of articles selected by the first two reviewers to ensure consistency in the abstraction of data from the articles. We did not hide reviewers from the authors, institution, or journal for each article.

Reviewers extracted information on general study characteristics, study participants, eligibility criteria, interventions, outcome measures, the method of ascertainment, and the outcomes, including measures of variability where available. We entered all information from the article review process into the DistillerSR database. We used the DistillerSR database to maintain the data, and then exported it into Microsoft Excel for the preparation of evidence tables.

Data extraction followed a similar process for the trials identified during the grey literature search. Two independent reviewers conducted title scans. For a title to be eliminated at this level, both reviewers had to indicate that the study was ineligible. If the reviewers disagreed, the article was advanced to the next level. All trials that were advanced to level 2 were screened by two reviewers and disagreements were adjudicated by a third party reviewer.

Quality (Risk of Bias) Assessment of Individual Studies

We used the Downs and Black instrument (see Appendix C) to assess the risk of bias in the included studies.43 We opted to apply it by focusing on the questions that we felt were most relevant to this body of literature. To be considered to be a study at low risk of bias, the study must have done all of the following: stated the objective clearly, described the main outcomes, described the characteristics of the enrolled subjects, described the intervention clearly, described the main findings, randomized the subjects to the intervention group, and concealed the intervention assignment until recruitment was complete. Additionally, the study had to have at least partially described the distributions of (potential) principal confounders in each treatment group.

We categorized the studies as having low risk of bias, moderate risk of bias, or high risk of bias: (1) If we could not determine one of the above items or it was not done, we considered the study to have at least a moderate risk of being biased; (2) If studies definitively did not do two or more of the above items, we considered the study to have a high risk of bias; (3) We did not require other items that are typically expected in a well-conducted randomized trial due to the types of interventions; that is, we did not require blinding for the study to be considered a low risk of bias study, and we did not require descriptions of loss to followup and complete adverse event reporting. Studies with a high risk of bias were thought to have significant flaws that might have invalidated the results.

Data Synthesis

For each KQ, we created a set of detailed evidence tables containing all information abstracted from eligible studies. The elements that we abstracted about the interventions included the behavior (e.g., diet or/and physical activity), and the mode of delivery for the intervention (e.g., education, a modification of the environment, instruction in self-management techniques). We abstracted data on weight-related or body composition outcomes (e.g., change in prevalence of obesity, change in BMI or BMI distribution in the population, changes in adiposity or other weight measures, prevalence of obesity or overweight), obesity-related clinical outcomes, adverse effects of the interventions, and intermediate outcomes (e.g., nutrition knowledge, food purchasing behaviors, calorie intake, diet composition, physical activity). We extracted information about the primary weight outcomes at the time points of 24 weeks (for school-based studies only), 52 weeks, between 54 and 104 weeks, and greater than 104 weeks.

We pooled the outcomes quantitatively (conducted meta-analysis) when we had three or more randomized controlled trials with similar interventions in comparable settings that were homogeneous. We first confirmed that the studies were sufficiently qualitatively homogenous with respect to the population characteristics, intervention, comparison, outcomes, and timing. For studies amenable to pooling with meta-analyses, we calculated pooled mean differences using a DerSimonian and Laird random effects model.44We did not conduct meta-analysis regarding other measures of the intervention effects such as odds ratio or relative risk estimates due to the limited number of comparable studies that reported such results. The result of each meta-analysis contributed to our assessment of the precision of the estimate of the outcome, which we used in our grading the strength of evidence.

We identified statistical heterogeneity between the studies using a chi-squared test with a significance level of alpha less than or equal to 0.10, and an I-squared statistic with a value greater than 50 percent indicating substantial heterogeneity. We conducted all meta-analyses using STATA (Intercooled, version 11, StataCorp, College Station, Texas).

We reviewed the studies for outcomes by key subgroups including outcomes reported by sex, age, or racial group, and reported the results separately by subgroups and pooled the data where appropriate.

We describe the evidence about the following outcomes: prevention of obesity or overweight (combined outcome of all weight-related outcomes), intermediate outcomes, clinical outcomes, and adverse events. Because of the diversity of measures, we did not calculate an effect size. Furthermore, the frequent lack of reporting of measures of variation made it impossible to calculate effect sizes. Rather our conclusions indicate whether the intervention suggests benefit, no benefit, or unknown benefit. We could not explicitly state whether the reported effects met a clinically relevant threshold as this is not well established in the obesity research community.

Strength of the Body of Evidence

In our results, we reported both the strength of evidence and the magnitude of effect (e.g., the difference in changes in BMI between the intervention and control group), but strength of evidence was the primary focus. Our meta-analysis reported magnitude of effect. After synthesizing the evidence, we graded the quantity, quality, and consistency of the best available evidence addressing each of our KQs by adapting an evidence grading scheme recommended in the Methods Guide for Conducting Comparative Effectiveness Reviews.56 In assigning evidence grades, we considered the four recommended domains including risk of bias in the included studies, directness of the evidence, consistency across studies, and precision of the pooled estimate or the individual study estimates.

We graded the evidence, for each setting, by intervention, comparator, and then by outcomes. We grouped the interventions for grading purposes as: 1) all diet interventions, 2) all physical activity interventions, and 3) all combined diet and physical activity interventions. We assigned grades for all weight-related outcomes together with each study contributing only one weight-related measure to the grade by setting up a hierarchy of outcomes. The hierarchy was set as follows: BMI z-score, BMI, prevalence of obesity and overweight, percent body fat, waist circumference, skinfold thickness. If a study measured BMI z-score and body fat, we only graded BMI z-score. We chose to use this hierarchy because these outcomes are closely correlated within an individual--particularly BMI and BMI z-score. We graded six categories of intermediate outcomes: change in energy (caloric) intake, change in fruit and vegetable intake, change in fatty food intake, change in sugar-sweetened beverage intake, change in physical activity, and change in sedentary activity. We did not grade adverse events, or clinical outcomes. Conclusions about the benefit of an intervention are unlikely to change with the addition of evidence grades for highly correlated outcomes. We did not grade adverse events, there were too few studies overall to do this. We graded selected intermediate outcomes; these were change in physical activity, change in food intake (e.g., fruit and vegetable intake, fatty foods intake, and sugar-sweetened beverage intake), change in energy intake, and change in physical activity. We chose to grade these intermediate outcomes as they are most likely to directly influence the weight outcomes.

We classified evidence pertaining to the KQs into four categories: 1) “high” grade (indicating high confidence that the evidence reflects the true effect, and further research is very unlikely to change our confidence in the estimate of the effect); 2) “moderate” grade (indicating moderate confidence that the evidence reflects the true effect, and further research may change our confidence in the estimate of the effect and may change the estimate); 3) “low” grade (indicating low confidence that the evidence reflects the true effect, and further research is likely to change our confidence in the estimate of the effect and is likely to change the estimate); and 4) “insufficient” grade (evidence is unavailable or there was only one study having more than a low risk of bias). We caution that a “high” strength of evidence grade is not necessarily an indicator of effectiveness – there can be strong evidence that an intervention is ineffective or even strong evidence of no effect.

We considered the body of evidence consistent in direction if 70 percent or more of the studies had an effect in the same direction (i.e., showed desirable effect verse not). We did not require a minimum number of studies to apply this rule, for example, a body of evidence with two positive and one negative study would be graded as inconsistent. We identified all studies as providing direct evidence since all of the studied interventions would directly affect one of our primary outcomes. We considered a study precise if the results for the given outcome were significant at a p value less than 0.05, or had narrow confidence intervals that excluded the null. If 70 percent or more of the studies that reported statistical significance had significant results, we considered the body of evidence precise. We did not require a minimum number of studies to apply this rule, for example, a body of evidence with two precise and one imprecise study would be graded as imprecise although we recognize that if the studies had been amenable to pooling, the precision might have increased with pooling.

We applied a grading algorithm to the body of evidence to have consistent grading across questions. If we found two studies with low risk of bias that had consistent direction of outcomes and no studies with a low risk of bias with outcomes in the opposite direction, we considered this to be high strength of evidence. If we found one study with low risk of bias and two or more studies with a moderate risk of bias, and they were all in a consistent direction, and no study with a low risk of bias with outcomes in the opposite direction, we considered this high strength of evidence. If there were no studies with a low risk of bias and the moderate risk of bias studies were consistent or predominantly consistent (>70 percent), we considered this moderate strength of evidence. If there were no low risk of bias studies and the studies with moderate risks of bias were inconsistent, we considered this low strength of evidence, the same is true of anything weaker than this.


We assessed applicability separately for each question guided by the PICOTS framework as recommended in the Methods Guide for Comparative Effectiveness Reviews of Interventions.52 We assessed whether there were features of the individual studies which limited the applicability of the study's findings to the general population.

Peer Review and Public Commentary

We invited experts in childhood obesity prevention and management, obesity policy, and individuals representing stakeholder and user communities to provide external peer review of this comparative effectiveness review. AHRQ and an associate editor also provided comments. AHRQ posted the draft report on its website for 4 weeks to elicit public comment. We addressed all reviewer comments, revised the text as appropriate, and documented our responses in a disposition of comments report that we will make available 3 months after AHRQ posts the final review on its website.

Cover of Childhood Obesity Prevention Programs: Comparative Effectiveness Review and Meta-Analysis
Childhood Obesity Prevention Programs: Comparative Effectiveness Review and Meta-Analysis [Internet].
Comparative Effectiveness Reviews, No. 115.
Wang Y, Wu Y, Wilson RF, et al.


AHRQ (US Agency for Healthcare Research and Quality)

PubMed Health Blog...

read all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...