Format

Send to

Choose Destination
See comment in PubMed Commons below
Stat Appl Genet Mol Biol. 2013 Apr 16;12(2):175-88. doi: 10.1515/sagmb-2012-0049.

Exploring the sampling universe of RNA-seq.

Author information

  • 1Center for Integrative Bioinformatics, Max F Perutz Laboratories, University of Vienna and Medical University of Vienna, Vienna, Austria. stefanie.tauber@univie.ac.at

Abstract

How deep is deep enough? While RNA-sequencing represents a well-established technology, the required sequencing depth for detecting all expressed genes is not known. If we leave the entire biological overhead and meta-information behind we are dealing with a classical sampling process. Such sampling processes are well known from population genetics and thoroughly investigated. Here we use the Pitman Sampling Formula to model the sampling process of RNA-sequencing. By doing so we characterize the sampling by means of two parameters which grasp the conglomerate of different sequencing technologies, protocols and their associated biases. We differ between two levels of sampling: number of reads per gene and respectively, number of reads starting at each position of a specific gene. The latter approach allows us to evaluate the theoretical expectation of uniform coverage and the performance of sequencing protocols in that respect. Most importantly, given a pilot sequencing experiment we provide an estimate for the size of the underlying sampling universe and, based on these findings, evaluate an estimator for the number of newly detected genes when sequencing an additional sample of arbitrary size.

PMID:
23629158
DOI:
10.1515/sagmb-2012-0049
[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for iFactory
    Loading ...
    Support Center