Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data

Sci Rep. 2016 Jun 2:6:25696. doi: 10.1038/srep25696.

Abstract

Principal components analysis (PCA) is a common unsupervised method for the analysis of gene expression microarray data, providing information on the overall structure of the analyzed dataset. In the recent years, it has been applied to very large datasets involving many different tissues and cell types, in order to create a low dimensional global map of human gene expression. Here, we reevaluate this approach and show that the linear intrinsic dimensionality of this global map is higher than previously reported. Furthermore, we analyze in which cases PCA fails to detect biologically relevant information and point the reader to methods that overcome these limitations. Our results refine the current understanding of the overall structure of gene expression spaces and show that PCA critically depends on the effect size of the biological signal as well as on the fraction of samples containing this signal.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Gene Expression Profiling / methods*
  • Humans
  • Molecular Sequence Annotation / methods*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Principal Component Analysis / methods*