Send to

Choose Destination
Ann N Y Acad Sci. 2009 Mar;1158:14-28. doi: 10.1111/j.1749-6632.2008.03750.x.

Creating reference datasets for systems biology applications using text mining.

Author information

Structural Biology and Biocomputing Group, Spanish National Cancer Research Centre, Madrid, Spain.


High-throughput experimental techniques are generating large data collections with the aim of identifying novel entities involved in fundamental cellular processes as well as drawing a systematic picture of the relationships between individual components. Determining the accuracy of the resulting data and the selection of a subset of targets for more careful characterizations often requires relying on information provided by manually annotated data repositories. These repositories are incomplete and cover only a small fraction of the knowledge contained in the literature. We propose in this paper the use of text-mining technologies to extract, organize, and present information relevant for a particular biological topic. The aims of the resulting approach are (1) to enable topic-centric biological literature navigation, (2) to assist in the construction of manually revised data repositories, (3) to provide prioritization of biological entities for experimental studies, and (4) to enable human interpretation of large-scale experiments by providing direct links of bio-entities to relevant descriptions in the literature.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center