Genes, themes and microarrays: using information retrieval for large-scale gene analysis

Proc Int Conf Intell Syst Mol Biol. 2000:8:317-28.

Abstract

The immense volume of data resulting from DNA microarray experiments, accompanied by an increase in the number of publications discussing gene-related discoveries, presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on cluster analysis of gene expression patterns. Clustering indeed reveals potentially meaningful relationships among genes, but can not explain the underlying biological mechanisms. In an attempt to address this problem, we have developed a new approach for utilizing the literature in order to establish functional relationships among genes on a genome-wide scale. Our method is based on revealing coherent themes within the literature, using a similarity-based search in document space. Content-based relationships among abstracts are then translated into functional connections among genes. We describe preliminary experiments applying our algorithm to a database of documents discussing yeast genes. A comparison of the produced results with well-established yeast gene functions demonstrates the effectiveness of our approach.

MeSH terms

  • Animals
  • DNA / genetics*
  • Humans
  • Oligonucleotide Array Sequence Analysis*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA