DFseq: Distribution-Free Method to Detect Differential Gene Expression for RNA-Sequencing Data

IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):558-565. doi: 10.1109/TCBB.2018.2866994. Epub 2018 Aug 27.

Abstract

Many current RNA-sequencing data analysis methods compare expressions one gene at a time, taking little consideration of the correlations among genes. In this study, we propose a method to convert such an one-dimensional comparison approach into a two-dimensional evaluation of the ratio of standard deviations (SD) of two constructed random variables. This method allows the identification of differentially expressed genes while controlling a preset significance level conditional on the read count mean-variance relationship. Meanwhile, correlations among genes are naturally accommodated due to the clustering of genes with similar distribution in the proposed σ-σ plot. The proposed distribution-free method is designated as DFseq, because it does not depend on a parametric distribution to fit read count. As a result, compared with parametric methods, DFseq can effectively handle genes with a bimodal-like distribution and/or genes with excessive 0 read counts, as well as genes with outlying observations. Besides, DFseq is an ideal platform for comparing performance of different differential gene expression detection methods.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Databases, Nucleic Acid
  • Gene Expression Profiling / methods*
  • Humans
  • Lung Neoplasms / genetics
  • Lung Neoplasms / metabolism
  • RNA / chemistry
  • RNA / genetics
  • RNA / metabolism
  • Sequence Analysis, RNA / methods*
  • Transcriptome / genetics

Substances

  • RNA