Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials Published in Non-English Languages [Internet]

Ethan M Balk; Mei Chung; Minghua L Chen; Thomas A Trikalinos; Lina Kong Win Chang

Assessing the Accuracy of Google Translate to Allow Data Extraction From Trials Published in Non-English Languages [Internet]

Review

Rockville (MD): Agency for Healthcare Research and Quality (US); 2013 Jan. Report No.: 12(13)-EHC145-EF.

AHRQ Methods for Effective Health Care.

Authors

Ethan M Balk¹, Mei Chung¹, Minghua L Chen¹, Thomas A Trikalinos¹, Lina Kong Win Chang¹

Affiliation

¹ Tufts Evidence-based Practice Center (EPC)

PMID: 23427350
Bookshelf ID: NBK121304

Excerpt

Background: One of the strengths of systematic reviews is that they aim to include all relevant evidence. However, study eligibility is often restricted to the English language for practical reasons. Google Translate, a free Web-based resource for translation, has recently become available. However, it is unclear whether its translation accuracy is sufficient for systematic reviews. An earlier pilot study provided some evidence that data extraction from translated articles may be adequate but varies by language. To address several limitations of the pilot study, four collaborating Evidence-based Practice Centers conducted a more rigorous analysis of translations of articles from five languages.

Methods: We included 10 randomized controlled trials in 5 languages (Chinese, French, German, Japanese, and Spanish). Eligible studies were trials that reported per-treatment group results data. Each article was translated into English using Google Translate. The time required to translate each study was tracked. The original language versions of the articles were double data extracted by fluent speakers and reconciled. Each English-translated article was extracted by two of eight researchers who did not speak the given language. These 8 researchers also each extracted 10 English-language trials to serve as a control. Data extracted included: eligibility criteria, study design features, outcomes reported, intervention and outcome descriptions, and results data for one continuous and/or one categorical outcome. We used a generalized linear mixed model to examine whether the probability of correctly extracting an item from a translated article is related to the language of original publication. The model used each extractor's accuracy in extracting the English language trials to control for reviewer effects.

Results: The length of time required to translate articles ranged from 5 minutes to about 1 hour for almost all articles, with an average of about 30 minutes. Extractors estimated that most Spanish articles required less than 5 additional minutes to extract because of translation issues, but about two-thirds of other language articles required between 6 and 30 additional minutes for extraction. Analyses of the adjusted percentage of correct extractions across items and languages and of the adjusted odds ratio of correct extractions compared with English revealed that in general, across languages the likelihood of correct extractions was greater for study design and intervention domain items than for outcome descriptions and, particularly, study results. Translated Chinese articles yielded the highest percentage of items (22 percent) that were incorrectly extracted more than half the time (but also the largest percentage of items, 41 percent, that were extracted correctly more than 98 percent of the time. Relative to English, extractions of translated Spanish articles were most accurate compared with other translated languages.

Conclusion: Translation generally required few resources. Across all languages, data extraction from translated articles was less accurate than from English language articles, particularly and importantly for results data. Extraction was most accurate from translated Spanish articles and least accurate from translated Chinese articles. Use of Google Translate has the potential of being an approach to reduce language bias; however, reviewers may need to be more cautious about using data from these translated articles. There remains a tradeoff between completeness of systematic reviews (including all available studies) and risk of error (due to poor translation).

Sections

Publication types

Review

Grants and funding

Prepared for: Agency for Healthcare Research and Quality, U.S. Department of Health and Human Services, Contract No. 290-2007-10055-I (Tufts EPC), 290-2007-10059-I (University of Ottawa EPC), 290-2007-10062-I (Southern California EPC), 290-2007-10063-I (ECRI Institute EPC). Prepared by: Tufts Evidence-based Practice Center, Tufts Medical Center, Boston, MA with input and assistance from University of Ottawa Evidence-based Practice Center, University of Ottawa, Ottawa, ON, Southern California Evidence-based Practice Center, Santa Monica, CA, ECRI Institute Evidence-based Practice Center, Plymouth Meeting, PA