- Journal List
- NIHPA Author Manuscripts
- PMC2996232

# Fundamental limits on the suppression of molecular fluctuations

^{1}Department of Engineering, University of Cambridge

^{2}Department of Systems Biology, Harvard University

Users may view, print, copy, download and text and data- mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: http://www.nature.com/authors/editorial_policies/license.html#terms

## Abstract

Negative feedback is common in biological processes and can increase a system’s stability to internal and external perturbations. But at the molecular level, control loops always involve signaling steps with finite rates for random births and deaths of individual molecules. By developing mathematical tools that merge control and information theory with physical chemistry we show that seemingly mild constraints on these rates place severe limits on the ability to suppress molecular fluctuations. Specifically, the minimum standard deviation in abundances decreases with the quartic root of the number of signaling events, making it extraordinarily expensive to increase accuracy. Our results are formulated in terms of experimental observables, and existing data show that cells use brute force when noise suppression is essential, e.g. transcribing regulatory genes 10,000s of times per cell cycle. The theory challenges conventional beliefs about biochemical accuracy and presents an approach to rigorously analyze poorly characterized biological systems.

Life in the cell is a complex battle between randomizing and correcting statistical forces: births and deaths of individual molecules create spontaneous fluctuations in abundances^{1}^{,}^{2}^{,}^{3}^{,}^{4} – noise – while many control circuits have evolved to eliminate, tolerate or exploit the noise^{5}^{,}^{6}^{,}^{7}^{,}^{8}. The net outcome is difficult to predict because each control circuit in turn consists of probabilistic chemical reactions. For example, negative feedback loops can compensate for changes in abundances by adjusting the rates of synthesis or degradation^{7}, but such adjustments are only certain to suppress noise if the individual deviations immediately and surely affect the rates^{5}. Even the simplest transcriptional autorepression by contrast involves gene activation, transcription and translation, introducing intermediate probabilistic events that can randomize or destabilize control. Negative feedback may thus either suppress or amplify fluctuations depending on the exact mechanisms, reaction steps and parameters^{9} – details that are difficult to characterize at the single cell level and that differ greatly from system to system. This raises a fundamental question: to what extent is biological noise inevitable and to what extent can it be controlled? Could evolution simply favor networks – however elaborate or ingeniously designed – that enable cells to homeostatically suppress any disadvantageous noise, or does the nature of the mechanisms impose inherent constraints that cannot be overcome?

## Control is limited by information loss

To address this question without oversimplifying or guessing at the complexity of cells, we consider a chemical species X_{1} that affects the production of a second species X_{2}, which in turn indirectly controls the production of X_{1} via an arbitrarily complicated reaction network with any number of components, nonlinear reaction rates, or spatial effects (Fig. 1). For generality, we only specify three of the chemical events of the larger network:

where *x*_{1} and *x*_{2} are numbers of molecules per cell, the birth and death rates are probabilistic reaction intensities, *τ*_{1} is the average lifetime of X_{1} molecules, *f* is a specified rate function, and the unspecified control network allows *u* to be dynamically and arbitrarily set by the full time history of X_{2} values. Death events for X_{2} are omitted because the results we derive rigorously hold for all types and rates of X_{2} degradation mechanisms, as long as they do not depend on X_{1}. The generality of *u* and *f* allows X_{1} to represent many different biological species: an mRNA with X_{2} as the corresponding protein, a protein with X_{2} as either its own mRNA or an mRNA downstream in the control pathway, an enzyme with X_{2} as a product, or a self-replicating DNA with X_{2} as a replication control molecule.

The arbitrary birth rate *u* represents a hypothetical ‘control demon’ that knows everything about past and present values of *x*_{2} and uses this information to minimize the variance in *x*_{1}. This corresponds to an optimal reaction network capable of any type of time-integration, frequency-based control, spatially extended dynamics, or other exotic actions. The sole restriction is that the control system depends on *x*_{1} only via reaction (*iii*), an example of a common chemical signaling relay where a concentration determines a rate. Because individual X_{2} birth events are probabilistic, some information about X_{1} is then inevitably and irrecoverably lost and the current value of X_{1} cannot be perfectly inferred from the X_{2} time-series. Specifically, the number of X_{2} birth events in a short time period is on average proportional to *f*(*x*_{1}), with a statistical uncertainty that depends on the average number of events. If *x*_{1} remained constant, the uncertainty could be arbitrarily reduced by integrating over a longer time, but because it keeps changing randomly on a time scale set by *τ*_{1}, integration can only help so much. The problem is thus equivalent to determining the strength of a weak light source by counting photons: each photon emission is probabilistic, and if the light waxes and wanes, counts from the past carry little information about the current strength. The otherwise omniscient control demon thus cannot know the exact state of the component it is trying to control.

We then quantify how finite signaling rates restrict noise suppression, without linearizing or otherwise approximating the control systems, by analytically deriving a feedback-invariant upper limit on the mutual information^{10} between X_{1} and X_{2} – an information-theoretic entropic measure for how much knowing one variable reduces uncertainty about another – and derive lower bounds on variances in terms of this limit. We use a continuous stochastic differential equation for the dynamics of species X_{1}, an approximation that makes it easier to extend the results to more contexts and processes, but keep the signaling and control processes discrete. After considerable dust has settled, this theory (summarized in Box 1 and detailed in the Supplementary Information, SI) allows us to calculate fundamental lower bounds on variances.

### Box 1

#### Outline of underlying theory

Statistical uncertainties and dependencies are often measured by variances and correlation coefficients, but both uncertainty and dependence can also be defined purely in terms of probabilities (*p _{i}*), without considering the actual states of the system. The Shannon entropy

*H*(

*X*) = Σ

*p*log

_{i}*p*measures inherent uncertainty rather than how different the outcomes are, and the mutual information between random variables

_{i}*I*(

*X*

_{1};

*X*

_{2}) =

*H*(

*X*

_{1})–

*H*(

*X*

_{1}|

*X*

_{2}) measures how much knowing one variable reduces entropic uncertainty in another, regardless of how their outcomes may correlate

^{10}

^{,}

^{27}. Despite the fundamental differences between these measures, however, there are several points of contact that can be used to predict limits on stochastic behavior.

First, because imperfectly estimating the state of a system fundamentally restricts the ability to control it (SI), there is a hard bound on variances whenever there is incomplete mutual information between the signal *X*_{2} and the controlled variable *X*_{1}. We quantify the bound by means of Pinsker’s nonanticipatory epsilon entropy^{28}, a rarely utilized information-theoretic concept that exploits the fact that the transmission of information in a feedback system must occur in real time. This shows (SI) how an upper bound on the mutual information *I* (*X*_{1}; *X*_{2}) – i.e. a limited Shannon capacity in the channel from *X*_{1} to *X*_{2} – imposes a lower bound on the mean squared estimation error *E* (*X*_{1}_{1})^{2}, where the ‘estimator’ _{1} is an arbitrary function of the discrete signal *X*_{2} time series and the *X*_{1} dynamics at equilibrium is described by a stochastic differential equation. Since the capacity of the molecular channels we consider is not increased by feedback, this results in a lower limit in the variance of *X*_{1}, in terms of the channel capacity *C*, that holds for arbitrary feedback control laws:
${\sigma}_{1}^{2}/\langle {x}_{1}\rangle \ge {(1+C{\tau}_{1})}^{-1}$.

Second, the Shannon capacity is potentially unlimited when information is sent over point process ‘Poisson channels’^{29},
${x}_{2}\stackrel{f}{\to}{x}_{2}+1$, as in stochastic reaction networks where a controlled variable affects the rate of a probabilistic signaling event. However, infinite capacity requires that the rate *f* (*x*_{1}) is unrestricted and thus that *X*_{1} is unrestricted – contrary to the purpose of control. Here we consider two types of restrictions. First, if the rate has an upper limit *f*_{max} it follows^{30} that *C*=*K*<*f*> where *K*= log(*f*_{max}/<*f*>). The channel capacity then equals the average intensity multiplied by the natural logarithm of the effective dynamic range *f*_{max}/<*f*>, and the noise bound follows
${\sigma}_{1}^{2}/{\langle {x}_{1}\rangle}^{2}\ge 1/({N}_{1}({KN}_{2}+1))$. This allows for any nonlinear function *f* (*x*_{1}) but, for specific functions, restricting the variance in *x*_{1} can further reduce the capacity. For example, we analytically show that the capacity of the generic Poisson channel subject to mean and variance constraints follows
$C=\langle f\rangle log(1+{\sigma}_{f}^{2}/{\langle f\rangle}^{2})$. Having less noise in *x*_{1}will reduce the variance in *f* and thereby make it harder to transmit the information that is fundamentally required to reduce noise. Combining this expression for the channel capacity with the feedback limit above reveals hard limits beyond which no improvements can be made: any further reduction in the variance would require a higher mutual information, which is impossible to achieve without instead increasing the variance. When *f* is linear in *x*_{1} this produces the result in Eq. (2). Analogous calculations allow us to derive capacity and noise results when *f* is a Hill function, or for processes with bursts, extrinsic noise, parallel channels, and cascades (SI). Finite channel capacities are the only fundamental constraints considered here, so at infinite capacity perfect noise suppression is possible by construction.

## Noise limited by 4^{th} root of signal rate

When the rate of making X_{2} is proportional to X_{1}, *f* =*αx*_{1}, for example when X_{1} is a template or enzyme producing X_{2}, the hard lower bound on the (squared) relative standard deviation created by the loss of information follows:

where <…> denotes population averages and *N*_{1} = <*u*>*τ*_{1} = <*x*_{1}> and *N*_{2} = *α*<*x*_{1}>*τ*_{1} are the numbers of birth events of X_{1} and X_{2} made on average during time *τ*_{1}. Thus no control network can significantly reduce noise when the signal X_{2} is made less frequently than the controlled component. When the signal is made more frequently than the controlled component, the minimal relative standard deviation (square root of Eq. (2)) at most decreases with the *quartic* root of the number of signal birth events. Reducing the standard deviation of X_{1} 10-fold thus requires that the signal X_{2} is made at least 10,000 times more frequently. This makes it hard to achieve high precision, and practically impossible to achieve extreme precision, even for the slowest changing X_{1} in the cell where the signals X_{2} may be faster in comparison.

Systems with nonlinear amplification before the infrequent signaling step are also subject to bounds. For arbitrary nonlinear encoding where *f* is an arbitrary functional of the whole *x*_{1} time history – corresponding to a second control demon between X_{1} and X_{2} – the quartic root limit turns into a type of square root limit (Box 1 and SI). However, gene regulatory functions typically saturate at full activation or leak at full repression, as the generalized Hill function
$f=v({K}_{1}+{x}_{1}^{h})/({K}_{2}+{x}_{1}^{h})$ with *K*_{1}*<K*_{2}. Here X_{1} may be an activator or repressor, and X_{2} an mRNA encoding either X_{1} or a downstream protein. Without linearizing *f* or restricting the control demon, an extension of the methods above (SI) reveals similar quartic root bounds as in Eq. (2), with the difference that *N*_{2} is replaced by *γN*_{2}_{,}_{max} where *γ* is on the order of one in a wide range of biologically relevant parameters (SI), and *N*_{2}_{,}_{max}= *vτ*_{1} = *N*_{2}
*v*/<*f*>. Cells can then produce much fewer signal molecules without reducing the information transfer, depending on the maximal rate increase *v*/<*f*>, but the quartic root effect still strongly dampens the impact on the noise limit. If X_{2} is an mRNA, *N*_{2}_{,}_{max} is also limited because transcription events tend to be relatively rare even for fully expressed genes.

Many biological systems show much greater fluctuations due to upstream sources of noise, or sudden ‘bursts’ of synthesis^{4}^{,}^{11}^{,}^{12}. If X_{1} molecules are made or degraded in bursts (size *b*_{1}*,* averaged over births and deaths) there is much more noise to suppress, and if signal molecules X_{2} are produced in bursts (size *b*_{2}) each independent burst only counts as a single signaling event in terms of the Shannon information transfer, and:

The effective average number of molecules or events is thus reduced by the size of the burst, which can increase the noise limits greatly in many biological systems. The effect of slower upstream fluctuations in turn depends on their time-scales, how they affect the system, and whether or not the control system can monitor the source of such noise directly. If noise in the X_{1} birth rate is extrinsic to X_{1} but not directly accessible by the controller, the predicted noise suppression limits can follow similar quartic root principles for both fast and slow extrinsic noise, while for intermediate time-scales the power-law is between 3/8 and ¼ (SI, and Fig 2).

## Information losses in cascades

Signaling in the cell typically involves numerous components that change in probabilistic events with finite rates. Information about upstream states is then progressively lost at each step much like a game of ‘broken telephone’ where messages are imperfectly whispered from person to person. If each signaling component X_{i}_{+1} decays exponentially and is produced at rate *α _{i}x_{i}*, an extension of the theory (SI) shows that if a control demon monitors X

_{n}_{+1}and controls X

_{1},

*N*

_{2}above is replaced by

where *N _{j}* is the average number of birth events (or bursts, as in Eq. (3)) of species

*j*during time period

*τ*

_{1}. Information transfer in cascades is thus limited by the components made in the lowest numbers, and because the total average number of birth events over the

*n*steps obeys

*N*≥

_{tot}*n*

^{2}

*N*, a five-step linear cascade requires

_{eff}*at least*25 times more birth events to maintain the same capacity to suppress noise as a single-step mechanism. This effect of information loss is superficially similar to noise propagation where variation in inputs cause variation in outputs, but though both effects reflect the probabilistic nature of infrequent reactions, the governing principles are very different. In fact, the mechanisms for preventing noise propagation – such as time-averaging or kinetic robustness to upstream changes

^{6}– cause a greater loss of information, while mechanisms that minimize information losses – such as all-or-nothing nonlinear effects

^{13}– instead amplify noise. Large variation in signaling intermediates is thus not necessarily a sign of reduced precision but could reflect strategies to minimize information loss, which in turn allows tighter control of downstream components.

The rapid loss of information in cascades also suggests another trade-off: effective control requires a combination of appropriately nonlinear responses and small information losses, but nonlinear amplification in turn requires multiple chemical reactions with a loss of information at each step. The actual bounds may thus be much more restrictive than predicted above, where assuming Hill functions or arbitrary control networks conceals this trade-off. One of the greatest challenges in the cell may be to generate appropriately nonlinear reaction rates without losing too much information along the way.

Parallel signal and control systems can instead improve noise suppression, since each signaling pathway contributes independent information about the upstream state. However, for a given total number of signaling events, parallel control cannot possibly reduce noise below the limits above: the loss of information is determined only by the total frequency of the signaling events, not their physical nature. The analyses above in fact implicitly allow for arbitrarily parallel control with *f* interpreted as the total rate of making control molecules affected directly by X_{1} (SI).

## Systems selected for noise suppression

The results above paint a grim picture for suppression of molecular noise. At first glance this seems contradicted by a wealth of biological counterexamples: molecules are often present in low numbers, signaling cascades where one component affects the rates of another are ubiquitous, and yet many processes are extremely precise. How is this possible if the limits apply universally? First, the transmission of chemical information is not fundamentally limited by the number of molecules present at any given time, but by the number of chemical events integrated over the time-scale of control (i.e., by *N*_{2} rather than <*x*_{2}> above). Second, most processes that have been studied quantitatively in single cells do in fact show large variation, and the anecdotal view of cells as microscopic-yet-precise largely comes from a few central processes where cells can afford a very high number of chemical events at each step, often using post-translational signaling cascades. Just like gravity places energetic and mechanistic constraints on flight but does not confine all organisms to the surface of the earth, the rapid loss of information in chemical networks places hard constraints on molecular control circuits but does not make any level of precision inherently impossible.

It can also be tempting to dismiss physical constraints simply because life seems fine despite them. For example, many cellular processes operate with a great deal of stochastic variation, and central pathways seem able to achieve sufficiently high precision. But such arguments are almost circular. The existence of flight does not make gravity irrelevant, nor do winged creatures simply fly sufficiently well. The challenges are instead to understand the trade-offs involved: what performances are selectively advantageous given the associated costs, and how small fitness differences are selectively relevant?

To illustrate the biological consequences of imperfect signaling we consider systems that must suppress noise for survival and must relay signals through gene expression, where chemical information is lost due to infrequent activation, transcription, and translation. The best characterized examples are the homeostatic copy number control mechanisms of bacterial plasmids that reduce the risk of plasmid loss at cell division. These have been described much like the example above with X_{1} as plasmids and X_{2} as plasmid-expressed inhibitors^{5}, except that plasmids self-replicate with rate *u*(*t*)*x*_{1} and therefore are bound by the quartic root limit for all values of *N*_{1} and *N*_{2} (SI, Fig. 2). To identify the mechanistic constraints when X_{1} production is directly inhibited by X_{2}, rather than by a control demon that is infinitely fast and that delivers the optimal response to every perturbation, we consider a closed toy model:

where X_{1} degradation is a proxy for partitioning at cell division, and the rate of making X_{2} is proportional to X_{1} because each plasmid copy encodes a gene for X_{2}. We then use the logarithmic gains^{6}^{,}^{14}
*H*_{12} = −ln*u*/ln*x*_{2} and
${H}_{22}=\partial ln({R}_{2}^{-}/{R}_{2}^{+})/\partial ln{x}_{2}$ to quantify the percentage responses in rates to percentage changes in levels without specifying the exact rate functions. Parameter *H*_{12} is similar to a Hill coefficient of inhibition, and *H*_{22} determines how X_{2} affects its own rates, increasing when it is negatively auto-regulated and decreasing when it is degraded by saturated enzymes. The ratio *H*_{12}/*H*_{22} is thus a total gain, corresponding to the eventual percentage response in *u* to a percentage change in *x*_{1}. With *τ*_{2} as the average lifetime of X_{2} molecules, stationary fluctuation-dissipation approximations^{6}^{,}^{15} (linearizing responses, SI) then give:

where the limit holds for all *H _{ij}* and

*τ*(SI). This reflects a classic trade-off in control theory: higher total gain suppresses spontaneous fluctuations in X

_{i}_{1}but amplifies the transmitted fluctuations from X

_{2}to X

_{1}. Numerical analysis confirms that even a Hill-type inhibition function

*u*can get close to the limit (not shown), and thus that direct inhibition can do almost as well as a control demon. However, the parameter requirements can be extreme: the signal molecules must be very short-lived, and the optimal gain ${({H}_{12}/{H}_{22})}_{\mathit{opt}}\approx \sqrt{{N}_{2}/{N}_{1}}$ may be so high that introducing any delays or ‘extrinsic’ fluctuations

^{6}

^{,}

^{16}would destabilize the dynamics. Regardless of the inhibition control network, plasmids thus need to express inhibitors at extraordinarily high rates, and generate strongly nonlinear feedback responses without introducing signaling cascades. Most plasmids indeed take these strategies to the extreme, for example transcribing control genes tens of thousands of times per cell cycle using several gene copies and some of the strongest promoters known. Some plasmids also eliminate many of the cascade steps inherent in gene expression, using small regulatory RNAs, and still create highly nonlinear responses using proofreading-type mechanisms (Fig. 3,

*left*). Others partially avoid indirect control by ensuring that the plasmid copies themselves prevent each others’ replication (Fig. 3,

*right*), or suppress noise without closing control loops

^{17}

^{,}

^{18}by changing the Poisson nature of the X

_{1}and X

_{2}chemical events (Eq. (1)). Though such schemes may have limited effects on variances

^{11}, some plasmids seem to take advantage of them

^{5}.

## Outlook

Several recent studies have generalized control-theoretic notions^{19}^{,}^{20} or applied them to biology^{21}^{,}^{22}. Others have demonstrated physical limits on the accuracy of cellular signaling^{13}^{,}^{23}^{,}^{24}^{,}^{25}, for example using fluctuation-dissipation approximations to predict estimation errors associated with a constant number of diffusing molecules hitting a biological sensor^{26}. Interestingly, the latter show that the minimal relative error decreases with the square root of the number of events, regardless of detection mechanism. Some studies have also analyzed the information transfer capacity of open-loop molecular systems^{25}, or extracted valuable insights from Gaussian small-noise approximations. Here we extend these works by developing exact mathematical methods for arbitrarily complex and nonlinear real-time feedback control of a dynamic process of noisy synthesis and degradation. In such systems, the minimal error decreases with the *quartic* root of the integer number of signaling events, making a decent job 16 times harder than a half-decent job. This perhaps explains why there is so much biochemical noise – correcting it would just be too costly – but also constrains other aspects of life in the cell. For example, the noise levels may increase or decrease along signaling cascades, depending on the kinetic details at each step, but information about upstream states is always progressively and irreversibly lost. Though it is tempting to believe that large reaction networks are capable of almost anything if the rates are suitably nonlinear, the opposite perspective may thus be more appropriate: having more steps where one component affects the rates of another creates more opportunities for losing information and fundamentally prevents more types of behaviors. While awaiting the detailed models that predict what single cells actually do – which require every probabilistic chemical step to be well characterized – fusing control and information theory with stochastic kinetics thus provides a useful starting point: predicting what cells cannot do.

## Acknowledgments

This research was supported by the BBSRC under grant BB/C008073/1, by the National Science Foundation Grants DMS-074876-0 and CAREER 0720056, and by grants GM081563-02 and GM068763-06 from the National Institutes of Health.

## Footnotes

Supplementary Information is linked to the online version of the paper at www.nature.com/nature

**Author contributions** The three authors (I.L., G.V., and J.P.) contributed equally, and all conceived the study, derived the equations, and wrote the paper.

**Author information** Reprints and permissions information is available at npg.nature.com/reprints. The authors declare no competing financial interests.

## References

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (920K) |
- Citation

- Inferring cell cycle feedback regulation from gene expression data.[J Biomed Inform. 2011]
*Ferrazzi F, Engel FB, Wu E, Moseman AP, Kohane IS, Bellazzi R, Ramoni MF.**J Biomed Inform. 2011 Aug; 44(4):565-75. Epub 2011 Feb 16.* - Noise management by molecular networks.[PLoS Comput Biol. 2009]
*Bruggeman FJ, Blüthgen N, Westerhoff HV.**PLoS Comput Biol. 2009 Sep; 5(9):e1000506. Epub 2009 Sep 18.* - A systems- and signal-oriented approach to intracellular dynamics.[Biochem Soc Trans. 2005]
*Wolkenhauer O, Sreenath SN, Wellstead P, Ullah M, Cho KH.**Biochem Soc Trans. 2005 Jun; 33(Pt 3):507-15.* - Positive feedback in cellular control systems.[Bioessays. 2008]
*Mitrophanov AY, Groisman EA.**Bioessays. 2008 Jun; 30(6):542-55.* - Engineering challenges of BioNEMS: the integration of microfluidics, micro- and nanodevices, models and external control for systems biology.[IEE Proc Nanobiotechnol. 2006]
*Wikswo JP, Prokop A, Baudenbacher F, Cliffel D, Csukas B, Velkovsky M.**IEE Proc Nanobiotechnol. 2006 Aug; 153(4):81-101.*

- Uncoupled Analysis of Stochastic Reaction Networks in Fluctuating Environments[PLoS Computational Biology. ]
*Zechner C, Koeppl H.**PLoS Computational Biology. 10(12)e1003942* - Adaptability of non-genetic diversity in bacterial chemotaxis[eLife. ]
*Frankel NW, Pontius W, Dufour YS, Long J, Hernandez-Nunez L, Emonet T.**eLife. 3e03526* - Discovering electrophysiology in photobiology: A brief overview of several photobiological processes with an emphasis on electrophysiology[Communicative & Integrative Biology. ]
*Volkov V.**Communicative & Integrative Biology. 7e28423* - Studying the organization of DNA repair by single-cell and single-molecule imaging[DNA Repair. 2014]
*Uphoff S, Kapanidis AN.**DNA Repair. 2014 Aug; 20(100)32-40* - Origin and Consequences of the Relationship between Protein Mean and Variance[PLoS ONE. ]
*Vallania FL, Sherman M, Goodwin Z, Mogno I, Cohen BA, Mitra RD.**PLoS ONE. 9(7)e102202*

- PubMedPubMedPubMed citations for these articles

- Fundamental limits on the suppression of molecular fluctuationsFundamental limits on the suppression of molecular fluctuationsNIHPA Author Manuscripts. Sep 9, 2010; 467(7312)174

Your browsing activity is empty.

Activity recording is turned off.

See more...