U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Paynter R, Bañez LL, Berliner E, et al. EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2016 Apr.

Cover of EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews

EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews [Internet].

Show details

Methods

General Approach

This project's overall aim was to provide a snapshot of the state of knowledge on the use of text mining in systematic reviews, providing groundwork for future methodologic work in this area. Given the project's exploratory nature, we adopted a multipronged approach to illustrate how text-mining tools have supported various steps in systematic review processes and, secondarily, different types of reviews. We conducted a literature review to identify existing research on text-mining use. We augmented this information with insights from Key Informants to capture senior investigator/organizational perspectives and information specialist/research team member perspectives on text mining. Lastly, we provided a descriptive evaluation of specific text-mining tools/software used to support systematic review processes.

A workgroup composed of members from the EPCs, the Scientific Resource Center (SRC), and AHRQ participated in weekly workgroup teleconference calls over a three-month period to discuss the direction and scope of the project, assign and coordinate tasks, collect and analyze data, and discuss and edit draft documents. The workgroup consisted of three professional librarians (EPC and SRC members), an EPC Project Manager, an EPC Senior Analyst, and two AHRQ Task Order Officers.

Initially, this exploratory research project intended to cover all steps within the systematic review process equally across the literature review and interviews; however, our emphasis changed early on because we found several recent existing systematic reviews that covered screening and data abstraction.14-16 Thus, this preliminary sketch of the use of text-mining tools within systematic review processes will attempt to more comprehensively cover searching and other less well-studied steps while summarizing the existing systematic reviews.

Text mining covers various techniques and tools used to detect patterns and extract knowledge from unstructured natural language text. Text mining uses statistical approaches to explore (e.g., co-occurrence, frequencies of words) and categorize (e.g., clustering, classification) text-based information to support knowledge discovery while minimizing human effort. We considered a text-mining tool to be any software or application to aid the process of text mining. We included resources that our Key Informants identified as text-mining tools although they are traditionally used for other purposes (e.g., EPPI-reviewer, EndNote, Microsoft Excel).

Literature Review

We searched a range of bibliographic databases and gray literature sources to identify candidate publications. We limited bibliographic searches by publication date (2005 – 2015) and to English-language publications due to time constraints to complete the research project. In addition to major biomedical databases, we searched the computer/information science literature to improve recall of relevant content. Time constraints precluded a full systematic review; however, we used the following inclusion criteria:

  • Does this article address text mining within the context of the systematic review process?
  • Does this article address an area of text mining that is of interest to this report?

Publications focused on text mining of electronic health records and administrative datasets (although of interest), were outside the scope of this white paper. Publications that focused solely on technical aspects of text-mining algorithms were excluded.

Our searches identified 1,473 candidate citations. After duplicate removal, 670 unique citations were uploaded to DistillerSR for review. The full text of 122 articles was retrieved for data abstraction. We noted whether text mining was used in the searching, screening, data-extraction, updating, or other parts of the systematic review process.

Given this review's rapid nature, we decided to rely on two 2015 systematic reviews that covered the use of text mining in the screening and data abstraction part of the systematic review process rather than conducting a de novo review for those areas.14,15 We discuss these reviews and studies focusing on searching and updating processes later in this report. Full details of strategy development, databases searched, and search strings are available in Appendix A.

Key Informant Interviews

We conducted Key Informant (KI) interviews with senior investigators from systematic review research organizations (n=4) and information specialists who have used text-mining tools to develop systematic review search strategies (n=4) to form a preliminary understanding of the experiences and insights of researchers who have used these tools. We decided to split the KIs into two groups for two reasons: 1) We thought the senior representatives were likely to have more experience with the use of text-mining tools in the screening phase of systematic reviews (reflecting the more extensive published literature that exists on this phase overall) and 2) the desire of the workgroup to focus on the searching phase to begin fleshing out the use of text mining in this step in greater detail. In compliance with the Paper Work Reduction Act Office of Management and Budget regulation (5 C.F.R. § 1320), the sample of KIs was limited to nine or fewer nonfederal employees. One of this report's team members conducted the interviews during July and August 2015 using semistructured interview instruments. At least two additional team members also attended each interview.

We identified potential KIs in the following ways: 1) by reviewing authors of relevant published literature, 2) by sending emails to librarian discussion lists to recruit potential participants (i.e., Cochrane IRMG, MLA Expert Searching, HTAi ISG-Info Resources), and 3) via contacts within the systematic review community. We invited 14 individuals to participate as KIs in an (approximately) 60-minute individual telephone interview; eight agreed and were interviewed, and six declined. KIs are listed in the Key Informants section of this report and are quoted anonymously in the text. All KIs had experience using text-mining tools in multiple reviews. All information specialist KIs are masters-level medical librarians, so information specialist and librarian are used synonymously hereafter.

All interviews were intended to be audio-recorded and transcribed; however, due to technical issues, two of the interviews were not recorded; for these interviews notes taken by three workgroup members were analyzed instead (these comments appear inside square brackets in Table C-2 to distinguish them from actual quotes). Scientific Resource Center methods research projects fall under the Portland VA Research Foundation Institutional Review Board's blanket ethics nonapplicable exemption; thus, no approval was sought for this project. At the beginning of each call, we asked KIs for permission to record the call for later analysis and to be quoted anonymously; all KIs verbally agreed to these conditions. Each KI completed an “EPC Conflict of Interest Disclosure Form” before being interviewed, and no disclosed conflicts precluded participation by any of the informants. All participants received a copy of the questions ahead of the scheduled conference call.

Interview Guide

The workgroup developed the interview guide through review and discussion over multiple iterations. We developed two separate sets of questions, one for senior investigators/organizational representatives and one for librarians/research team members. Please see Appendix B for a copy of the interview guide.

Data Analysis

Transcripts were analyzed using a constructivist grounded theory approach in NVivo™ 10 software by one investigator with qualitative analysis experience who developed the descriptive coding structure and themes.17-21 The larger workgroup reviewed the original transcripts and critiqued the analysis. Please see Appendix C for a table outlining specific text-mining tools with more extended comments by librarians.

Tools Catalog

We compiled a list of text-mining tools identified in the literature and from broad web-based searches. We created a table to summarize features and describe characteristics, accessibility, and potential applications to the systematic review process.

Two team members examined prespecified characteristics and cross-referenced features with those mentioned by Key Informants and identified in the literature search. We elected to focus on components and features likely relevant to topic refinement, literature searching, study selection, and data extraction for systematic reviews. We informally evaluated the potential for a tool feature to support one or more of these key steps of the systematic review. Our subjective assessment of tool utility and relevance was informed by the team's collective experience developing and executing comprehensive literature searches, as well as from the requisite knowledge of the selection, extraction, and appraisal process derived from guidance and standards issued by the EPC Program for conducting comparative-effectiveness reviews and international reporting standards of various stages in a comparative-effectiveness review.22

We did not include information-processing products or services (e.g., Doctor Evidence) unless they were mentioned specifically by the Key Informants (e.g., EndNote, DistillerSR, EPPI-Reviewer). We did not examine machine learning or tools designed to extract or describe name relationships exclusively (e.g., genetic and biologic entity recognition).The term “text mining” frequently captures tools designed to extract and classify granular information from the molecular biology literature. Although similar in concept and underlying mechanism, we did not include those in our catalog. Readers who are interested in detailed explanations and comparisons of the component tasks and methods (e.g., preprocessing, context representation, content selection) will find ample information elsewhere, particularly within the bioinformatics, computer and engineering sciences, and biostatics literature.23-29

We rated a tool as applicable to systematic reviews if the tool was designed to support systematic review conduct or could be adapted to improve or augment existing systematic review tools or methods. We assessed each tool for functionality to enhance a) topic refinement, scope, or question development; b) searching or retrieval of literature or candidate data; c) screening or eligibility assessment; or d) data extraction or synthesis. We included text-mining tools with features to support overall quality or efficiency of one or more steps in the review process.

Table 2 lists the labels and definitions for the variables that we prespecified for the characterization of text-mining tool features. Given the varying levels of sophistication of tools and the technical support required for installation and/or setup, we did not test the tools or applications for relative performance or precision. As a preliminary assessment, our group focused on availability, capability, stability, and usability. Where possible, we listed key features. We established definitions for the categories and choices to ensure a degree of comparability and enable meaningful classifications. We defined “tool” as any application, resource, software program, software feature, open-source code, or web-based resource intended to automate or facilitate information analysis.

Table 2. Prespecified items to characterize text-mining tools.

Table 2

Prespecified items to characterize text-mining tools.

Prioritization of Tools Assessment

We prioritized our assessment of tools based on the following: tools that were out of scope or unlikely applicable to systematic reviews were rated as “low” or “no” applicability to the systematic review process (e.g., gene or protein data-mining tools); we did not download or install software to evaluate, focusing for this assessment on those tools that are available via the web. We did not evaluate proprietary products and ceased assessment when testing a strategy or text document did not work properly.

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...