Send to

Choose Destination
Stat Appl Genet Mol Biol. 2013 Apr 16;12(2):175-88. doi: 10.1515/sagmb-2012-0049.

Exploring the sampling universe of RNA-seq.

Author information

Center for Integrative Bioinformatics, Max F Perutz Laboratories, University of Vienna and Medical University of Vienna, Vienna, Austria.


How deep is deep enough? While RNA-sequencing represents a well-established technology, the required sequencing depth for detecting all expressed genes is not known. If we leave the entire biological overhead and meta-information behind we are dealing with a classical sampling process. Such sampling processes are well known from population genetics and thoroughly investigated. Here we use the Pitman Sampling Formula to model the sampling process of RNA-sequencing. By doing so we characterize the sampling by means of two parameters which grasp the conglomerate of different sequencing technologies, protocols and their associated biases. We differ between two levels of sampling: number of reads per gene and respectively, number of reads starting at each position of a specific gene. The latter approach allows us to evaluate the theoretical expectation of uniform coverage and the performance of sequencing protocols in that respect. Most importantly, given a pilot sequencing experiment we provide an estimate for the size of the underlying sampling universe and, based on these findings, evaluate an estimator for the number of newly detected genes when sequencing an additional sample of arbitrary size.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Sheridan PubFactory
Loading ...
Support Center