The minimal intrinsic stochasticity of constitutively expressed eukaryotic genes is sub-Poissonian

Stochastic variation in gene products (“noise”) is an inescapable by-product of gene expression. Noise must be minimized to allow for the reliable execution of cellular functions. However, noise cannot be suppressed beyond an intrinsic lower limit. For constitutively expressed genes, this limit is believed to be Poissonian, meaning that the variance in mRNA numbers cannot be lower than their mean. Here, we show that several cell division genes in fission yeast have mRNA variances significantly below this limit, which cannot be explained by the classical gene expression model for low-noise genes. Our analysis reveals that multiple steps in both transcription and mRNA degradation are essential to explain this sub-Poissonian variance. The sub-Poissonian regime differs qualitatively from previously characterized noise regimes, a hallmark being that cytoplasmic noise is reduced when the mRNA export rate increases. Our study re-defines the lower limit of eukaryotic gene expression noise and identifies molecular requirements for ultra-low noise which are expected to support essential cell functions.


Section A: Statistical tests of size-corrected Fano factors A.1 Statistical methodology
For statistical conclusions on the size-corrected Fano factors, we used a hierarchical bootstrapping approach. Experimental replicates (shown separately in the main figures) and cells within replicates were resampled with replacement. From this, the statistic of interest (average Fano factor across experimental replicates, difference between genotypes, average difference in Fano factor between mono-and binucleated cells, or average difference in Fano factor between cellular compartments) was calculated. To estimate the statistic's distribution, 5,000 bootstrap replicates were performed, and the mean and 95 % confidence interval (2.5 th to 97.5 th percentile) were calculated from the bootstrap results. Up to 0.5 % of bootstrap replicates were allowed to fail to calculate (i.e., returned NaN), and these were discarded before calculating the mean and confidence interval. If > 0.5 % of bootstrap replicates failed, no statistical results were reported for that analysis. The latter was only the case for mature nuclear mRNA from mad1 in binucleated cells (TS-labelled data), due to a low cell number in one replicate. A genotype was considered to have a size-corrected RNA distribution significantly different from Poisson if the bootstrap 95 % confidence interval excluded 1, and differences were considered significant if the confidence interval excluded 0.

A.3 Comparison of Fano factors in cellular compartments
We tested if size-corrected Fano factors differ between cellular compartments using hierarchical bootstrapping. Mean and 95 % confidence intervals from bootstrapping the mean difference between compartments are provided. Confidence intervals that exclude 0 are interpreted as significantly different Fano factors between compartments.

Table A-I: Comparison between nuclear and cytoplasmic Fano factors (non-TS labelled)
Comparing nuclear and cytoplasmic Fano factors in experiments without transcription-site labelling (see Fig. 6 and A-I). A positive mean Fano factor difference indicates a larger Fano factor in the cytoplasm than the nucleus, while a negative mean Fano factor difference indicates a smaller Fano factor in the cytoplasm than the nucleus.

A.4 Comparison of Fano factors between mononucleated and binucleated cells
We tested if size-corrected Fano factors differ between mono-and binucleated cells using hierarchical bootstrapping. Mean and 95 % confidence intervals from bootstrapping the mean difference are provided. Confidence intervals that exclude 0 are interpreted as a significant difference in Fano factors.

Table A-IV: Comparison of Fano factors between binucleated and mononucleated cells (non-TS labelled)
Comparing Fano factors from mono-and binucleated cells in experiments without transcription-site labelling (see Fig. 6 and A-I). A positive mean Fano factor difference indicates a larger Fano factor in binucleated than mononucleated cells, while a negative mean Fano factor difference indicates a smaller Fano factor in binucleated than mononucleated cells.

Table A-V: Comparison of Fano factors between binucleated and mononucleated cells (TS-labelled)
Comparing Fano factors from mono-and binucleated cells in experiments with transcription-site labelling (see Fig. 7 and A-III). A positive mean Fano factor difference indicates a larger Fano factor in binucleated than mononucleated cells.

A.5 Comparison of Fano factors between endogenous and exogenous locus and with different coding sequences
We tested if size-corrected Fano factors for mad2 and mad3 differ between endogenous and exogenous locus, and after exchanging the coding sequence, using hierarchical bootstrapping.
Mean and 95 % confidence intervals from bootstrapping the mean difference are provided.
Confidence intervals that exclude 0 are interpreted as a significant difference in Fano factors.  (10), and deterministic elongation has been observed in budding yeast (11). Similar models have been proposed (6,(12)(13)(14). Note that in this model there is no transcriptionally inactive state in the sense that each state is associated with a particular stage of the transcription process.
Nuclear export is modelled by the reaction with rate kC.. This process is modelled by a single rate-limiting step. 1 is to be interpreted as the mRNA upon its entrance into the cytoplasm.
Degradation of mRNA in the cytoplasm is modelled by the chain of reactions Cytoplasmic mRNA degradation in eukaryotes is a complex multi-step process (15,16) that we simplify to the chain of reactions above. We are not aware of specific data for the number of rate-limiting steps R in the degradation process. In Deneke et al., 2013 (16), it was shown that choosing R = 5 can fit data in yeast well, though other values may provide an equally good fit. The rate of switching from → +1 is given by kD. Note that the measured cytoplasmic mRNA is assumed to be the sum of cytoplasmic mRNA at any stage of its cytoplasmic lifecycle, i.e. = ∑ =1 .

B.2 General properties of the model
The model is characterized by four rate constants kA, kB, kC and kD, the number of rate-limiting steps in the initiation process (S), and the number of rate-limiting steps in cytoplasmic degradation The mean time between two successive nascent mRNA production events is given by: The mean time for nuclear export is: The mean time for cytoplasmic degradation is:

=
The chemical master equation of the model is difficult to solve exactly, even in steady-state conditions. This is not unexpected since exact solutions for the joint distributions of molecule numbers are only known for a handful of special cases (17). However, because all reactions are first order, the propensities of the chemical master equation are linear in the molecule numbers.
This implies that the means, variances and covariances of all species can be obtained exactly in closed-form using the linear-noise approximation (17). The matrix form of this method (18) is particularly useful because it makes computations using a computer algebra system such as Mathematica easy for any number of species (which can be arbitrarily large for our model depending on the values of S and R).
As in the telegraph model of gene expression (19), in steady-state conditions the moments of nuclear and cytoplasmic mRNA are functions not of the absolute values of the rate constants but rather of the rate constants normalized by the degradation rate, i.e. kA / kD, kB / kD, kC / kD. Hence, without loss of generality we set kD = R such that is fixed to unity; in other words, this choice non-dimensionalizes time by dividing it by the mean time it takes for a cytoplasmic mRNA to decay.
It is straightforward to derive the first-order moments of nuclear and cytoplasmic mRNA numbers per gene copy. These are respectively given by: = ( + ( −1)) and = + ( −1) .
The closed-form solutions for the second-order moments of mRNA numbers were obtained using Mathematica. They are very complex and hence we do not show them here. Instead, we state properties of the stochastic model that follow from an analysis of these solutions: For all parameters, the Fano factors of nuclear mRNA and of cytoplasmic mRNA are found to be less than or equal to 1. For S = 1, they are exactly equal to 1. As shown in previous publications (13,14), the crucial ingredient of the model that leads to sub-Poisson noise is the switching of the  (20,21). Mathematically, the switching of the promoter state causes sub-Poisson noise in the transcript numbers because the coefficient of variation of the time between two successive nascent mRNA production events is less than 1, i.e. less than expected from an exponential distribution.
Note that if there was no state switching upon the production of a nascent transcript then the Fano factors of nascent, nuclear and cytoplasmic mRNA would be greater than or equal to 1 (22,23). In models of this type, the interpretation is that U1 to US-1 are transcriptional off states while US is a transcriptionally active state. Nascent mRNA can be continuously produced from this state and there is no associated promoter clearing.
In steady-state conditions, the dynamics of nascent mRNA elongation and release do not influence the moments of the nuclear and cytoplasmic mRNA in our model. This is because we are assuming that release happens a certain fixed time T after initiation of elongation, and hence the time between two subsequent nascent mRNA production events is the same as the time between two subsequent mature nuclear mRNA production events. In other words, the moments of nuclear and cytoplasmic mRNA in steady-state conditions remain the same if we instead modeled transcription as where the last reaction has rate constant kB.
In steady-state conditions, the Fano factor of cytoplasmic mRNA can either be larger or smaller than the Fano factor of nuclear mRNA. From the closed-form solutions it is found that in the limit goes to infinity, the Fano factor of cytoplasmic mRNA is smaller than that of nuclear mRNA.
The exact value of when the two Fano factors are equal depends on the values of , , S and R.
The minimum Fano factor of nuclear mRNA is achieved when = ≫ , in other words when the rates for promoter remodeling and promoter freeing are in the same range and considerably faster than export. The minimum Fano factor of cytoplasmic mRNA is achieved in the limits ≫ = ≫ , in other words when the rates for promoter remodeling and promoter freeing are in the same range and considerably faster than degradation but slower than export. The minimum values possible for a model with S and R each between 1 and 4 are shown in Table B-I and Fig. 7G.

B.3 Bayesian model selection and parameter inference
While it is immediately clear from Table B-I that models with one rate limiting step for transcription will not be able to explain the sub-Poisson nature of the data, it is an open question which of the models with two or more rate-limiting steps in transcription and one or more rate-limiting steps in degradation best fits the data. To answer this question, we used a simple ABC (Approximate Bayesian Computation) rejection sampler (24) to perform Bayesian model selection (25).
For each gene and for a given model with certain values of S and R, the ABC algorithm randomly samples parameters , , from a prior distribution. Since we do not have prior knowledge about the 3 parameters, we simply choose the distribution to be uniform over a certain range.
Stochastic simulations of the model with the selected parameters are run using Gillespie's exact algorithm (26) until steady-state is achieved. Then a distance function between the moments of the data and the model is evaluated. Note that the number of samples of Gillespie's algorithm equals the number of single-cell measurements for a given gene. This constraint makes sure that the simulation-generated moments have the same variability (due to finite number of samples) as the moments calculated from the data. Those parameters whose distance function is less than a certain threshold are accepted. The mean and standard deviation of the marginal distribution of these accepted parameters (the posterior) provides an estimate of the values of the parameters and their uncertainty. We also calculate the acceptance rate, which is the number of parameter sets that are accepted divided by the total number of parameter sets that were randomly generated by the ABC algorithm. We then select which model best selects the data for a given gene by comparing the acceptance rate of any two models. The Bayes factor of two models A and B is defined as the ratio of the acceptance rate of the two models (25). If the Bayes factor is larger than 10, we conclude that there is strong evidence in favor of model A; if the Bayes factor is less than 1/10, we conclude that there is strong evidence in favor of model B (27). Otherwise, if the Bayes factor is between 1/10 and 10, we will conclude that there is no strong evidence for model A or model B (there might be weak evidence).
The heart of the ABC algorithm involves accepting only those parameters which satisfy a userdefined criterion. In our case the criterion is that the following distance function is smaller than a preset threshold value: Therefore, we multiply the model's estimate of the mean by 2 in the expression for the distance function. The Fano factor is unchanged because both the mean and variance are doubled.
Note that the distance function does not have information about the mean of nuclear mRNA but only about its Fano factor. This choice is motivated by a direct comparison of fitting to the data with or without the transcription site label for the two genes, mad1 and rpb1. When the transcription site is not labelled (non-TS labelled), it is uncertain which fraction of the nuclear RNA is nascent RNA and which fraction is mature nuclear RNA. Not making this distinction leads to a considerably different estimate for the mean of nuclear RNA, , from non-TS labelled data compared to that estimated from TS-labelled data (Table B-II). The relative error is around 200 %.
In contrast, estimating the nuclear Fano factor, , is more robust (relative error less than 20 %).
Furthermore, the estimates for both cytoplasmic mean and cytoplasmic Fano factor are similar, regardless of which data is used. We therefore excluded the mean of nuclear RNA, , from the distance function. The only remaining information to be specified for the ABC rejection sampler is the choice of the ranges over which the parameters are randomly sampled and the threshold values ( ) of the distance function for each gene (Table B-III). We have not included cdc13 in this analysis because its data is consistent with a Poisson distribution (Section A.4). Theoretically it is known that the distribution of accepted parameters from an ABC rejection sampler will be a good approximation to the true posterior distribution provided the threshold value is sufficiently small (25). A threshold value is hence chosen small enough so that repeating with a smaller value does not significantly change the posterior; this value will necessarily be different from one gene to another, particularly given the variable sample sizes. The range of the uniform prior distribution was initially chosen to be (0,250) for , , . We noticed that in many cases initial runs of the ABC rejection sampler identified parameters satisfying the user-defined criterion only in small regions of this range.
Hence to obtain a large number of accepted parameters within a computationally reasonable time (which is needed to obtain a well-defined posterior distribution) it was necessary to limit the range of the prior distributions to a subset of numbers within (0,250) which the initial exploration identified as containing the vast majority of the accepted parameters. We implemented the constraint that, for a given parameter set, the 12 models should be indistinguishable with respect to the mean nuclear and cytoplasmic mRNA. This is achieved by scaling by a factor proportional to − 1 (following from the equations for the first-order moments above). Note that = .
Gene  Table B-XI. We find strong evidence that mad1, mad1-GFP, and bub1-GFP are best described by models with 3 or 4 ratelimiting steps in transcription and 2 -4 rate-limiting steps in degradation. In contrast, for the rest of the genes (mad2-GFP, mad3-GFP, rpb1, sep1-GFP), most of the 12 models fit the data equally well. We also note that the models with S = 3 and R = 3 or 4, as well as S = 4 and R = 2, are the only models that can explain the data from all genes.

Table B-IV: Bayes factors for mad1
The value in the i th row and j th column is the Bayes factor of models i and j. Bayes factors ≥ 10 are shown in blue (model i explains the data better than model j); Bayes factors ≤ 1/10 are shown in red (model j explains the data better than model i); Bayes factors in the range (1/10,10) are shown in black, which is interpreted as the two models being indistinguishable. Those models which explain the data best, i.e. those lacking Bayes factors ≤ 1/10, are shaded in yellow.        which is remarkable given that the TS-labelled data was not used to estimate parameters.
Note that the posterior distributions are not uniform (as the priors), but peaked ( Figure B-I), implying that the inference procedure was successful at finding optimal parameter values for each gene. The identified parameters varied across genes ( Figure B-I,B-II).

B.4 Testing the ansatz of Poisson fluctuations
We 1)) = 0.975. In Table B-XIV we show the computation of the 95 % confidence interval for the Fano factor for all genes. Note that cdc13 is the only gene whose nuclear and cytoplasmic Fano factors both fall within the confidence intervals (also see Fig. A-I). On this basis, we cannot rule out that this gene has Poissonian expression characteristics, and hence we have not used data from this gene when fitting to the model. We note that for some genes their nuclear Fano factor falls within the confidence intervals whereas their cytoplasmic Fano factor does not (Table B-XIV). This might be expected due to the relationship between nuclear and cytoplasmic Fano factors when the nuclear export rate is high (Fig. 7H).