NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cover of A Pilot Study Using Machine Learning and Domain Knowledge To Facilitate Comparative Effectiveness Review Updating

A Pilot Study Using Machine Learning and Domain Knowledge To Facilitate Comparative Effectiveness Review Updating

Methods Research Reports

Investigators: , PhD, , MD, PhD, , PhD, , PhD, , BA, and , MD, MS.

Southern California Evidence-based Practice Center
Rockville (MD): Agency for Healthcare Research and Quality (US); .
Report No.: 12-EHC069-EF

Structured Abstract


Comparative effectiveness reviews need to be updated frequently to maintain their relevance. Results of earlier screening efforts should be useful in reducing the screening of thousands of newer citations for articles relevant to efficacy/effectiveness and adverse effects (AEs).


We collected 14,700 PubMed® citation classification decisions from a 2007 comparative effectiveness review of interventions to prevent fractures in persons with low bone density (LBD). We also collected 1,307 PubMed citation classification decisions from a 2006 comparative effectiveness review of off-label uses of atypical anti-psychotic drugs (AAP). We first extracted explanatory variables from the MEDLINE® citation related to key concepts, including the intervention, outcome, and study design. We then used the data to empirically derive statistical models (based on sparse generalized linear models with convex penalties [GLMnet] and gradient boosting machine [GBM]) that predicted inclusion in the AAP and LBD reviews. Finally, we evaluated performance on the 11,003 PubMed citations retrieved for the LBD and AAP updated reviews.


Sensitivity (percentage of relevant citations corrected identified), positive predictive value (PPV, percentage of predicted relevant citations that were truly relevant), and workload reduction (percentage of screening avoided).


GLMnet- and GBM-based models performed similarly, with GLMnet (results shown below) performing slightly better. The GLMnet-based model yielded sensitivities of 0.921 and 0.905 and PPVs of 0.185 and 0.102 when predicting articles relevant to the AAP and LBD efficacy/effectiveness analyses respectively (using a threshold of p ≥0.02). GLMnet performed better when identifying AE-relevant articles for the AAP review (sensitivity=0.981) than for the LBD review (0.685). When attempting to maximize sensitivity, GLMnet achieved high sensitivities (0.99 for AAP and 1.0 for LBD) while reducing projected screening by 55.4 percent (1990/3591 articles for AAP) and 63.2 percent (4,454/7,051 for LBD).


In this pilot study, we evaluated statistical classifiers that used previous classification decisions and key explanatory variables derived from MEDLINE indexing terms to predict inclusion decisions on two simulated comparative effectiveness review updates. The system achieved higher sensitivity in evaluating efficacy/effectiveness articles than in evaluating LBD AE articles. In the simulation, this prototype system reduced workload associated with screening updated search results for all relevant efficacy/effectiveness and AE articles by more than 50 percent with minimal or no loss of relevant articles. After refinement, these document classification algorithms could help researchers maintain up-to-date reviews.

Prepared for: Agency for Healthcare Research and Quality, U.S. Department of Health and Human Services1, Contract No. 290-2007-10062-I, Prepared by: Southern California Evidence-based Practice Center, Santa Monica, CA

Suggested citation:

Dalal SR, Shekelle PG, Hempel S, Newberry SJ, Motala A, Shetty KD. A Pilot Study Using Machine Learning and Domain Knowledge To Facilitate Comparative Effectiveness Review Updating. Methods Research Report (Prepared by the Southern California Evidence-based Practice Center under Contract No. 290-2007-10062-I). AHRQ Publication No. 12-EHC069-EF. Rockville, MD: Agency for Healthcare Research and Quality. September 2012.

This report is based on research conducted by the Southern California Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-2007-10062-I). The findings and conclusions in this document are those of the author(s), who are responsible for its contents; the findings and conclusions do not necessarily represent the views of AHRQ. Therefore, no statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.

The information in this report is intended to help health care decisionmakers—patients and clinicians, health system leaders, and policymakers, among others—make well-informed decisions and thereby improve the quality of health care services. This report is not intended to be a substitute for the application of clinical judgment. Anyone who makes decisions concerning the provision of clinical care should consider this report in the same way as any medical reference and in conjunction with all other pertinent information, i.e., in the context of available resources and circumstances presented by individual patients.

This report may be used, in whole or in part, as the basis for development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

No investigators have any affiliations or financial involvement (e.g., employment, consultancies, honoraria, stock options, expert testimony, grants or patents received or pending, or royalties) that conflict with material presented in this report.


540 Gaither Road, Rockville, MD 20850; www​

Bookshelf ID: NBK109161PMID: 23057094


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (731K)

Related information

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...