Format

Send to:

Choose Destination
See comment in PubMed Commons below
Methods. 2014 Oct 1;69(3):306-14. doi: 10.1016/j.ymeth.2014.06.004. Epub 2014 Jun 26.

DegPack: a web package using a non-parametric and information theoretic algorithm to identify differentially expressed genes in multiclass RNA-seq samples.

Author information

  • 1Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.
  • 2Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea.
  • 3Department of Computer Science, School of Informatics and Computing, Indiana University, Bloomington, IN, USA.
  • 4Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea; Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea; Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea. Electronic address: sunkim.bioinfo@snu.ac.kr.

Abstract

Gene expression in the whole cell can be routinely measured by microarray technologies or recently by using sequencing technologies. Using these technologies, identifying differentially expressed genes (DEGs) among multiple phenotypes is the very first step to understand difference between phenotypes. Thus many methods for detecting DEGs between two groups have been developed. For example, T-test and relative entropy are used for detecting difference between two probability distributions. When more than two phenotypes are considered, these methods are not applicable and other methods such as ANOVA F-test and Kruskal-Wallis are used for finding DEGs in the multiclass data. However, ANOVA F-test assumes a normal distribution and it is not designed to identify DEGs where genes are expressed distinctively in each of phenotypes. Kruskal-Wallis method, a non-parametric method, is more robust but sensitive to outliers. In this paper, we propose a non-parametric and information theoretical approach for identifying DEGs. Our method identified DEGs effectively and it is shown less sensitive to outliers in two data sets: a three-class drought resistant rice data set and a three-class breast cancer data set. In extensive experiments with simulated and real data, our method was shown to outperform existing tools in terms of accuracy of characterizing phenotypes using DEGs. A web service is implemented at http://biohealth.snu.ac.kr/software/degpack for the analysis of multi-class data and it includes SAMseq and PoissonSeq methods in addition to the method described in this paper.

Copyright © 2014 Elsevier Inc. All rights reserved.

KEYWORDS:

Differentially expressed genes; Information theoretic algorithm; Multiclass; Non-parametric algorithm; RNA-seq

PMID:
24981074
[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science
    Loading ...
    Write to the Help Desk