Accurate quantification of transcriptome from RNA-Seq data by effective length normalization

Soohyun Lee; Chae Hwa Seo; Byungho Lim; Jin Ok Yang; Jeongsu Oh; Minjin Kim; Sooncheol Lee; Byungwook Lee; Changwon Kang; Sanghyuk Lee

doi:10.1093/nar/gkq1015

Accurate quantification of transcriptome from RNA-Seq data by effective length normalization

Nucleic Acids Res. 2011 Jan;39(2):e9. doi: 10.1093/nar/gkq1015. Epub 2010 Nov 8.

Authors

Soohyun Lee¹, Chae Hwa Seo, Byungho Lim, Jin Ok Yang, Jeongsu Oh, Minjin Kim, Sooncheol Lee, Byungwook Lee, Changwon Kang, Sanghyuk Lee

Affiliation

¹ Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Yuseong-gu, Daejeon, Korea.

Abstract

We propose a novel, efficient and intuitive approach of estimating mRNA abundances from the whole transcriptome shotgun sequencing (RNA-Seq) data. Our method, NEUMA (Normalization by Expected Uniquely Mappable Area), is based on effective length normalization using uniquely mappable areas of gene and mRNA isoform models. Using the known transcriptome sequence model such as RefSeq, NEUMA pre-computes the numbers of all possible gene-wise and isoform-wise informative reads: the former being sequences mapped to all mRNA isoforms of a single gene exclusively and the latter uniquely mapped to a single mRNA isoform. The results are used to estimate the effective length of genes and transcripts, taking experimental distributions of fragment size into consideration. Quantitative RT-PCR based on 27 randomly selected genes in two human cell lines and computer simulation experiments demonstrated superior accuracy of NEUMA over other recently developed methods. NEUMA covers a large proportion of genes and mRNA isoforms and offers a measure of consistency ('consistency coefficient') for each gene between an independently measured gene-wise level and the sum of the isoform levels. NEUMA is applicable to both paired-end and single-end RNA-Seq data. We propose that NEUMA could make a standard method in quantifying gene transcript levels from RNA-Seq data.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Cell Line
Computer Simulation
Gene Expression Profiling / methods*
Gene Expression Profiling / standards
Humans
Polymerase Chain Reaction
Protein Isoforms / genetics
RNA, Messenger / analysis*
RNA, Messenger / chemistry
Reproducibility of Results
Sequence Analysis, RNA*

Substances

Protein Isoforms
RNA, Messenger