The Global Error Assessment (GEA) model for the selection of differentially expressed genes in microarray data

Bioinformatics. 2004 Nov 1;20(16):2726-37. doi: 10.1093/bioinformatics/bth319. Epub 2004 May 14.

Abstract

Motivation: Microarray technology has become a powerful research tool in many fields of study; however, the cost of microarrays often results in the use of a low number of replicates (k). Under circumstances where k is low, it becomes difficult to perform standard statistical tests to extract the most biologically significant experimental results. Other more advanced statistical tests have been developed; however, their use and interpretation often remain difficult to implement in routine biological research. The present work outlines a method that achieves sufficient statistical power for selecting differentially expressed genes under conditions of low k, while remaining as an intuitive and computationally efficient procedure.

Results: The present study describes a Global Error Assessment (GEA) methodology to select differentially expressed genes in microarray datasets, and was developed using an in vitro experiment that compared control and interferon-gamma treated skin cells. In this experiment, up to nine replicates were used to confidently estimate error, thereby enabling methods of different statistical power to be compared. Gene expression results of a similar absolute expression are binned, so as to enable a highly accurate local estimate of the mean squared error within conditions. The model then relates variability of gene expression in each bin to absolute expression levels and uses this in a test derived from the classical ANOVA. The GEA selection method is compared with both the classical and permutational ANOVA tests, and demonstrates an increased stability, robustness and confidence in gene selection. A subset of the selected genes were validated by real-time reverse transcription-polymerase chain reaction (RT-PCR). All these results suggest that GEA methodology is (i) suitable for selection of differentially expressed genes in microarray data, (ii) intuitive and computationally efficient and (iii) especially advantageous under conditions of low k.

Availability: The GEA code for R software is freely available upon request to authors.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Analysis of Variance
  • Animals
  • Cell Line
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation / physiology*
  • Humans
  • Interferon-gamma / pharmacology
  • Models, Genetic*
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Skin / drug effects
  • Skin / metabolism*
  • Software
  • Statistics as Topic / methods

Substances

  • Interferon-gamma