A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data

Genet Mol Res. 2016 Jun 3;15(2). doi: 10.4238/gmr.15027670.

Abstract

With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

MeSH terms

  • Algorithms
  • Computational Biology
  • Computer Simulation
  • Gene Expression / genetics
  • High-Throughput Nucleotide Sequencing
  • Protein Isoforms / biosynthesis
  • Protein Isoforms / genetics*
  • RNA / biosynthesis
  • RNA / genetics*
  • Regression Analysis
  • Sequence Analysis, RNA*
  • Software

Substances

  • Protein Isoforms
  • RNA