Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
Genome Biol. 2002;3(12):RESEARCH0067. Epub 2002 Nov 21.

Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data.

Author information

  • 1Equipe Signalisations et identités cellulaires, Centre de Génétique Moléculaire et Cellulaire CNRS UMR 5534, Université Claude Bernard Lyon 1, 16 rue Dubois, F-69622 Villeurbanne cedex, France.

Abstract

BACKGROUND:

The association-rules discovery (ARD) technique has yet to be applied to gene-expression data analysis. Even in the absence of previous biological knowledge, it should identify sets of genes whose expression is correlated. The first association-rule miners appeared six years ago and proved efficient at dealing with sparse and weakly correlated data. A huge international research effort has led to new algorithms for tackling difficult contexts and these are particularly suited to analysis of large gene-expression matrices. To validate the ARD technique we have applied it to freely available human serial analysis of gene expression (SAGE) data.

RESULTS:

The approach described here enables us to designate sets of strong association rules. We normalized the SAGE data before applying our association rule miner. Depending on the discretization algorithm used, different properties of the data were highlighted. Both common and specific interpretations could be made from the extracted rules. In each and every case the extracted collections of rules indicated that a very strong co-regulation of mRNA encoding ribosomal proteins occurs in the dataset. Several rules associating proteins involved in signal transduction were obtained and analyzed, some pointing to yet-unexplored directions. Furthermore, by examining a subset of these rules, we were able both to reassign a wrongly labeled tag, and to propose a function for an expressed sequence tag encoding a protein of unknown function.

CONCLUSIONS:

We show that ARD is a promising technique that turns out to be complementary to existing gene-expression clustering techniques.

PMID:
12537556
[PubMed - indexed for MEDLINE]
PMCID:
PMC151169
Free PMC Article

Images from this publication.See all images (6)Free text

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Write to the Help Desk