Format

Send to

Choose Destination
See comment in PubMed Commons below
BMC Bioinformatics. 2009 Jun 23;10:193. doi: 10.1186/1471-2105-10-193.

Filtering genes for cluster and network analysis.

Author information

  • 1Department of Biostatistics, University of Toronto, Toronto, Ontario, Canada. dtritch@rogers.com

Abstract

BACKGROUND:

Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias.

RESULTS:

This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks.

CONCLUSION:

The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.

PMID:
19549335
PMCID:
PMC2708160
DOI:
10.1186/1471-2105-10-193
[PubMed - indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Support Center