Format

Send to:

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2010 Feb 15;26(4):493-500. doi: 10.1093/bioinformatics/btp692. Epub 2009 Dec 18.

RNA-Seq gene expression estimation with read mapping uncertainty.

Author information

  • 1Department of Computer Sciences, University of Wisconsin, Madison, WI 53706, USA.

Abstract

MOTIVATION:

RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically.

RESULTS:

We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20-25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed.

PMID:
20022975
[PubMed - indexed for MEDLINE]
PMCID:
PMC2820677
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire Icon for PubMed Central
    Loading ...
    Write to the Help Desk