Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials Published in Non-English Languages

Ethan M Balk; Mei Chung; Minghua L Chen; Thomas A Trikalinos; Lina Kong Win Chang

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials Published in Non-English Languages

Methods Research Reports

Investigators: Ethan M Balk, MD, MPH, Mei Chung, PhD, Minghua L Chen, MD, MPH, Thomas A Trikalinos, MD, PhD, and Lina Kong Win Chang, BS.

Author Information and Affiliations

Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Jan.

Report No.: 12(13)-EHC145-EF

Structured Abstract

Background:

One of the strengths of systematic reviews is that they aim to include all relevant evidence. However, study eligibility is often restricted to the English language for practical reasons. Google Translate, a free Web-based resource for translation, has recently become available. However, it is unclear whether its translation accuracy is sufficient for systematic reviews. An earlier pilot study provided some evidence that data extraction from translated articles may be adequate but varies by language. To address several limitations of the pilot study, four collaborating Evidence-based Practice Centers conducted a more rigorous analysis of translations of articles from five languages.

Methods:

We included 10 randomized controlled trials in 5 languages (Chinese, French, German, Japanese, and Spanish). Eligible studies were trials that reported per-treatment group results data. Each article was translated into English using Google Translate. The time required to translate each study was tracked. The original language versions of the articles were double data extracted by fluent speakers and reconciled. Each English-translated article was extracted by two of eight researchers who did not speak the given language. These 8 researchers also each extracted 10 English-language trials to serve as a control. Data extracted included: eligibility criteria, study design features, outcomes reported, intervention and outcome descriptions, and results data for one continuous and/or one categorical outcome. We used a generalized linear mixed model to examine whether the probability of correctly extracting an item from a translated article is related to the language of original publication. The model used each extractor's accuracy in extracting the English language trials to control for reviewer effects.

Results:

The length of time required to translate articles ranged from 5 minutes to about 1 hour for almost all articles, with an average of about 30 minutes. Extractors estimated that most Spanish articles required less than 5 additional minutes to extract because of translation issues, but about two-thirds of other language articles required between 6 and 30 additional minutes for extraction. Analyses of the adjusted percentage of correct extractions across items and languages and of the adjusted odds ratio of correct extractions compared with English revealed that in general, across languages the likelihood of correct extractions was greater for study design and intervention domain items than for outcome descriptions and, particularly, study results. Translated Chinese articles yielded the highest percentage of items (22 percent) that were incorrectly extracted more than half the time (but also the largest percentage of items, 41 percent, that were extracted correctly more than 98 percent of the time. Relative to English, extractions of translated Spanish articles were most accurate compared with other translated languages.

Conclusion:

Translation generally required few resources. Across all languages, data extraction from translated articles was less accurate than from English language articles, particularly and importantly for results data. Extraction was most accurate from translated Spanish articles and least accurate from translated Chinese articles. Use of Google Translate has the potential of being an approach to reduce language bias; however, reviewers may need to be more cautious about using data from these translated articles. There remains a tradeoff between completeness of systematic reviews (including all available studies) and risk of error (due to poor translation).

Prepared for: Agency for Healthcare Research and Quality, U.S. Department of Health and Human Services¹, Contract No. 290-2007-10055-I (Tufts EPC), 290-2007-10059-I (University of Ottawa EPC), 290-2007-10062-I (Southern California EPC), 290-2007-10063-I (ECRI Institute EPC). Prepared by: Tufts Evidence-based Practice Center, Tufts Medical Center, Boston, MA with input and assistance from University of Ottawa Evidence-based Practice Center, University of Ottawa, Ottawa, ON, Southern California Evidence-based Practice Center, Santa Monica, CA, ECRI Institute Evidence-based Practice Center, Plymouth Meeting, PA

Suggested citation:

Balk EM, Chung M, Chen ML, Trikalinos TA, Kong Win Chang L. Assessing the Accuracy of Google Translate To Allow Data Extraction From Trials Published in Non-English Languages. Methods Research Report. (Prepared by the Tufts Evidence-based Practice Center under Contract No. 290-2007-10055-1.) Rockville, MD: Agency for Healthcare Research and Quality. January 2013. AHRQ Publication No. 12(13)-EHC145-EF. www.effectivehealthcare.ahrq.gov/reports/final.cfm.

This report is based on research conducted by the Tufts Evidence-based Practice Center (EPC) under contract to the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD (Contract No. 290-2007-10055-I). The findings and conclusions in this document are those of the author(s), who are responsible for its content, and do not necessarily represent the views of AHRQ. No statement in this report should be construed as an official position of AHRQ or of the U.S. Department of Health and Human Services.

The information in this report is intended to help health care decisionmakers—patients and clinicians, health system leaders, and policymakers, among others—make well-informed decisions and thereby improve the quality of health care services. This report is not intended to be a substitute for the application of clinical judgment. Anyone who makes decisions concerning the provision of clinical care should consider this report in the same way as any medical reference and in conjunction with all other pertinent information, i.e., in the context of available resources and circumstances presented by individual patients.

This report may be used, in whole or in part, as the basis for development of clinical practice guidelines and other quality enhancement tools, or as a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

None of the investigators have any affiliations or financial involvement that conflicts with the material presented in this report.

1: 540 Gaither Road, Rockville, MD 20850; www.ahrq.gov

Bookshelf ID: NBK121304PMID: 23427350

< PrevNext >

PubReader
Print View
Cite this Page
Balk EM, Chung M, Chen ML, et al. Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials Published in Non-English Languages [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Jan.
PDF version of this title (756K)

Other titles in these collections

Related information

NLM Catalog
Related NLM Catalog Entries

Recent Activity

Clear Turn Off Turn On

Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials ...
Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials Published in Non-English Languages
¿Por qué es importante la genética para mí y mi familia? - Una guía para entende...
¿Por qué es importante la genética para mí y mi familia? - Una guía para entender la genética y la salud
Long term urinary catheters - Infection: Prevention and Control of Healthcare-As...
Long term urinary catheters - Infection: Prevention and Control of Healthcare-Associated Infections in Primary and Community Care
Rotor Syndrome - GeneReviews®
Rotor Syndrome - GeneReviews®
Results - Challenges in Synthesizing and Interpreting the Evidence From a System...
Results - Challenges in Synthesizing and Interpreting the Evidence From a Systematic Review of Multifactorial Interventions to Prevent Functional Decline in Older Adults

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Bookshelf

Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials Published in Non-English Languages

Authors

Affiliations

Structured Abstract

Background:

Methods:

Results:

Conclusion:

Contents

Suggested citation:

Views

Other titles in these collections

Related information

Similar articles in PubMed

Recent Activity