Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications

Miguel Vazquez; Martin Krallinger; Florian Leitner; Alfonso Valencia

doi:10.1002/minf.201100005

Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications

Mol Inform. 2011 Jun;30(6-7):506-19. doi: 10.1002/minf.201100005. Epub 2011 Jul 12.

Authors

Miguel Vazquez¹, Martin Krallinger¹, Florian Leitner¹, Alfonso Valencia²

Affiliations

¹ Centro Nacional de Investigaciones Oncológicas, Biología Computacional y Estructural, Madrid, Spain.
² Centro Nacional de Investigaciones Oncológicas, Biología Computacional y Estructural, Madrid, Spain. valencia@cnio.es.

PMID: 27467152
DOI: 10.1002/minf.201100005

Abstract

Providing prior knowledge about biological properties of chemicals, such as kinetic values, protein targets, or toxic effects, can facilitate many aspects of drug development. Chemical information is rapidly accumulating in all sorts of free text documents like patents, industry reports, or scientific articles, which has motivated the development of specifically tailored text mining applications. Despite the potential gains, chemical text mining still faces significant challenges. One of the most salient is the recognition of chemical entities mentioned in text. To help practitioners contribute to this area, a good portion of this review is devoted to this issue, and presents the basic concepts and principles underlying the main strategies. The technical details are introduced and accompanied by relevant bibliographic references. Other tasks discussed are retrieving relevant articles, identifying relationships between chemicals and other entities, or determining the chemical structures of chemicals mentioned in text. This review also introduces a number of published applications that can be used to build pipelines in topics like drug side effects, toxicity, and protein-disease-compound network analysis. We conclude the review with an outlook on how we expect the field to evolve, discussing its possibilities and its current limitations.

Keywords: Chemical compounds; Drugs; Information extraction; Named entity recognition; Text mining.

Publication types

Review