Format

Send to

Choose Destination
See comment in PubMed Commons below
BMC Bioinformatics. 2013 Feb 7;14:42. doi: 10.1186/1471-2105-14-42.

A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data.

Author information

1
Applied Mathematics Department, Agrocampus Ouest, 65, rue de Saint-Brieuc, Rennes, France. marie.verbanck@agrocampus-ouest.fr

Abstract

BACKGROUND:

Gene clustering algorithms are massively used by biologists when analysing omics data. Classical gene clustering strategies are based on the use of expression data only, directly as in Heatmaps, or indirectly as in clustering based on coexpression networks for instance. However, the classical strategies may not be sufficient to bring out all potential relationships amongst genes.

RESULTS:

We propose a new unsupervised gene clustering algorithm based on the integration of external biological knowledge, such as Gene Ontology annotations, into expression data. We introduce a new distance between genes which consists in integrating biological knowledge into the analysis of expression data. Therefore, two genes are close if they have both similar expression profiles and similar functional profiles at once. Then a classical algorithm (e.g. K-means) is used to obtain gene clusters. In addition, we propose an automatic evaluation procedure of gene clusters. This procedure is based on two indicators which measure the global coexpression and biological homogeneity of gene clusters. They are associated with hypothesis testing which allows to complement each indicator with a p-value.Our clustering algorithm is compared to the Heatmap clustering and the clustering based on gene coexpression network, both on simulated and real data. In both cases, it outperforms the other methodologies as it provides the highest proportion of significantly coexpressed and biologically homogeneous gene clusters, which are good candidates for interpretation.

CONCLUSION:

Our new clustering algorithm provides a higher proportion of good candidates for interpretation. Therefore, we expect the interpretation of these clusters to help biologists to formulate new hypothesis on the relationships amongst genes.

PMID:
23387364
PMCID:
PMC3635920
DOI:
10.1186/1471-2105-14-42
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Support Center