The expected proportion of informative reads increases with genetic divergence and read length. (A, B) Black lines show expected proportions of informative reads (i.e. sequence fragments that could be unambiguously assigned to one allele) predicted by for transcribed sequences containing 0.1, 0.5, 1 or 5% sequence divergence, as indicated. Predictions are shown in which either one single nucleotide polymorphism (SNP) (A) or two SNPs (B) were required for a sequencing read to be informative for measuring allelic expression. (C, D) Predictions based on 0.1% and 1% sequence divergence and requiring only one SNP to be informative are shown again, as they were in (A). Results from simulated data sets are also shown. Each simulation contained either 20 (C) or 200 (D) reads that were generated using a virtual 2000 bp mRNA sequence, 0.1% or 1% sequence divergence, and sequencing reads of 35, 150, 300 and 800 bp. Each scenario was simulated 500 times, and is summarized by boxplots showing the median, lower and upper quartiles, as well as the 1.5 interquartile range. The gray lines are the 95% confidence intervals of the expected proportions based on binomial sampling (Clopper-Pearson interval on , ).

## PubMed Commons