Asynchronous abundance fluctuations can drive giant genotype frequency fluctuations

Large stochastic population abundance fluctuations are ubiquitous across the tree of life1–7, impacting the predictability of population dynamics and influencing eco-evolutionary outcomes. It has generally been thought that these large abundance fluctuations do not strongly impact evolution, as the relative frequencies of alleles in the population will be unaffected if the abundance of all alleles fluctuate in unison. However, we argue that large abundance fluctuations can lead to significant genotype frequency fluctuations if different genotypes within a population experience these fluctuations asynchronously. By serially diluting mixtures of two closely related E. coli strains, we show that such asynchrony can occur, leading to giant frequency fluctuations that far exceed expectations from models of genetic drift. We develop a flexible, effective model that explains the abundance fluctuations as arising from correlated offspring numbers between individuals, and the large frequency fluctuations result from even slight decoupling in offspring numbers between genotypes. This model accurately describes the observed abundance and frequency fluctuation scaling behaviors. Our findings suggest chaotic dynamics underpin these giant fluctuations, causing initially similar trajectories to diverge exponentially; subtle environmental changes can be magnified, leading to batch correlations in identical growth conditions. Furthermore, we present evidence that such decoupling noise is also present in mixed-genotype S. cerevisiae populations. We demonstrate that such decoupling noise can strongly influence evolutionary outcomes, in a manner distinct from genetic drift. Given the generic nature of asynchronous fluctuations, we anticipate that they are widespread in biological populations, significantly affecting evolutionary and ecological dynamics.


S2 Effective model of population fluctuations S2.1 Two-genotype case
To model the process of population fluctuations, we first consider a simple population consisting of one genotype, where all individuals are identical.The change in population size of a genotype µ from size N µ in the current time point to size N ′ µ in the next time point can be decomposed as where ∆n µ,i ∈ [−1, 0, 1, . . .], i ∈ [1, N µ ] is -1 plus the offspring number of the i th cell of the N µ cells of the µ genotype that exist at the current time step.We know that if all cells behave the same way, we have to demand ⟨∆n µ,i ∆n µ,j ⟩ = c µ,0 for i = j c µ,1 for i ̸ = j (S2) Here, the brackets ⟨•⟩ represent the mean of a random variable.This is the only mathematical form of the covariance parameters that won't change if we relabel the cells-a symmetry argument that can easily be extended to multiple genotypes.This assumes that the offspring number distributions have finite covariances, i.e. are not heavy-tailed.Note that c 1 < c 0 follows from 0 < ⟨(∆n i − ∆n j ) 2 ⟩ = 2c 0 − 2c 1 .
We can express the variance in total population size in terms of c 0 and c 1 , The form of this variance scaling has been previously noted [1,2].Power law mean-variance scaling of population abundance has been widely observed in ecology, where it is known as Taylor's power law [3,4,5,6,7].For large enough values of N µ , the distribution of N ′ µ will converge in distribution to a gaussian random variable, because of the central limit theorem.So the fluctuations of N ′ µ will solely depend on the variance, var(N ′ µ ).We can now extend the analysis to a population consisting of two genotypes labeled A and B, possibly with non-identical properties (Figure 2A).We can generalize the below analysis to multiple genotypes (SI section S2.3).Extending the symmetry argument from above, we now require five covariance parameters to fully describe the population, The expression for the variance of the total population size for each genotype is the same as in equation S5.The covariance of the total population sizes between genotypes is, Where N tot = N A + N B is the total population size, and is the frequency of genotype A in the population.The frequency of a genotype will also change from one time point to the next, induced by the fluctuations of individuals.The expected frequency variance at the next time point can be written as, If we assume that the frequency deviations are small, we can expand in the deviations to obtain, We can approximate the expected frequency by expanding to second order,

S2.2 Fluctuating environments
We can incorporate the effects of a fluctuating environment into our model, decomposing offspring number fluctuations into intrinsic and extrinsic fluctuations.We consider a fluctuating environment that causes genotype fitness to fluctuate with gaussian noise, w(E) ∼ N (s 0 , v E ).Conditioned on the environmental state, the covariance matrix ⟨∆n µ,i ∆n ν,j |E⟩ corresponds to equation S6, representing the intrinsic fluctuations of the system.And so the total abundance variance conditioned on the environmental state, var(N ′ µ |E), also corresponds to equation S5.To consider the combined effect of the intrinsic and extrinsic fluctuations on the total population abundance, we must marginalize out the effect of environmental fluctuations, The conditional abundance density will be gaussian by the central limit theorem, Here, N (x, µ, v) is a standard gaussian density.We use a standard identity for the product of gaussian densities, By rearranging, applying the above identity, and integrating equation S13, we obtain a final marginal density of var( Thus, the total strength of correlated abundance fluctuations is simply the sum of the strength of fitness fluctuations (extrinsic source) and the strength of the intrinsic fluctuations.Applying the same logic, it immediately follows that the decoupling parameter δ can also be decomposed into intrinsic and extrinsic components, δ → δ I + δ E .

S2.3 Generalization to m genotypes
When there are m distinct genotypes/species present in a population, possibly with differing offspring number covariances, what is the expected frequency variance?
We first consider the possible covariance parameters between the change in offspring number for a single individual across a single generation, ∆n µ,i , where µ labels the genotype and i labels the individual.Again, ∆n µ,i ∈ [−1, 0, 1, . . .], i ∈ [1, N µ ] is -1 plus the offspring number of the i th cell of the N µ cells of the µ genotype that exist at the current time step.For m genotypes, we require 2m + m(m − 1)/2 covariance parameters: We define the initial frequency of genotype A (arbitrarily labeled) as, where The frequency after one generation is similarly defined, and labeled as f ′ A .If we assume that frequency deviations are small, we can expand in the deviations to obtain, Following some algebra, and using equation S16, we arrive at the following expression, We can see that the generalized equation has a similar form to the two-species case, where the first two terms arise from independent offspring number fluctuations, and the remaining terms arise from correlated offspring number fluctuations.

S3 Evolutionary implications
To study the implications of decoupling noise, we consider our two-genotype model in the continuous time diffusion limit under constant selection (assuming small deviations in total population size), Where η(t) is standard gaussian white noise, ⟨η(t)⟩ = 0 and ⟨η(t)η(t ′ )⟩ = δ(t − t ′ ).We use the approximate frequency mean and variance for the drift and diffusion terms from equations S12 and S10, respectively.We have combined several parameters for notational simplicity, This model is similar to those analyzed in previous studies.Gillespie (1974) [8] studied the case where δ = 0 but κ A ̸ = κ B .Conversely, Melbinger and Vergassola (2015) [9] studied the case where

S3.1 Fixation probability
Given that the genotype/mutant gets introduced into the population at some initial frequency f 0 , what is the probability that it will fix in the population?To find this, we'll use the Kolmogorov Backward Equation (KBE) for this system.In general, the KBE is, In this case, the drift and diffusion terms are given by, The steady state solution of equation S23 gives us the fraction of trajectories that end up at each of the absorbing states (here: 0 or 1).After setting equation S23 to 0, and applying the boundary conditions Π(0) = 0 and Π(1) = 1, we can rearrange and integrate to obtain a general solution for the fixation probability, For our system, the fixation probability becomes, Or expanded, Which are in terms of new compound parameters, ) We are typically interested in the case where the initial frequency of the minor genotype is small, perhaps because it arose via spontaneous mutation in the population, or an individual with that genotype migrated to the population.Thus, we would like to focus on the limit of small initial frequency, f 0 ≪ 1.We start by Taylor expanding equation S29 around f 0 = 0, Following some algebra, we see that the second-order term in the expansion is negligible compared to the first-order term when, In this limit, we can truncate equation S36 to first order in f 0 ; the fixation probability becomes, With this expression, we now wish to analyze the behavior of p f ix in the limit of weak selection, s e → 0. We thus expand in s e , By comparing the first two terms, we see that p f ix is approximately constant at low s e , This approximation is valid when s e ≪ s * e , where, We recover the classical expression for the critical fitness effect when decoupling noise is weak, δ ≪ κ, and when ∆κ = 0, and thus we also see that the fixation probability when s e ≪ s * e is simply f 0 in this case.Now we turn to analyzing the opposite limit, where selection is strong.We will go back to the unapproximated expression for p f ix in equation S32.Rearranging, we obtain, When s e ≫ s * e and ∆κ = 0, we can expand in either small δ or small N e to recover Kimura's classical fixation probability [10], This approximation is also valid in the limit of weak decoupling noise, δ ≪ 6κ f0(2f0−3) .

S3.2 Site frequency spectrum
The site frequency spectrum (SFS) describes the expected density of derived alleles at a given frequency; specifically p SF S (f )df is the number of derived alleles in the frequency range . We calculate the SFS for alleles affected by decoupling noise by leveraging previously described approaches [12,13] along with our Langevin equation (equation S20).We find a general closed form solution for the SFS in terms of the frequency of alleles with parameters from A, When f = 0.5 and ∆κ = 0, the SFS reduces to, In the limit that δ ≫ 1/N e , and, Similarly, the limit of the SFS as f → 1 is, Applying the limit that δ ≫ 1/N e , If we are interested in the weak selection limit, where s → 0, and again ∆κ = 0 then equation S46 becomes, In the limit of strong decoupling noise, δ ≫ N e , equation S53 becomes, In the high frequency limit, (1 − f ) ≪ 1, We see that when we additionally apply the limit that (1 − f A ) ≫ (δN e ) −1 , we see that the SFS shows an uptick in the intermediate-high frequency regime, We find that in the limit of weak correlated fluctuations, δ ≪ 1/N e , and when κ A = κ B , equation S46 reduces to the classical expression [14] for the site frequency spectrum for mutations under constant selection, i.e.

S4 Chaotic dynamics S4.1 Variance between biological replicates
Here we show how exponentially increasing variance between biological replicates is equivalent to exponentially diverging trajectories, which is characteristic of chaotic dynamics.Specifically, in a one-dimensional system where we observe two trajectories, f i and f j , initially separated by ϵ, we expect that at small times, their distance from each other will grow exponentially, In chaotic systems, λ > 0. If we observe N initially nearby trajectories, labeled f i (dropping the time index for simplicity), then the variance between trajectories is, From equation S58, we can then replace (f i − f j ) 2 with ϵ i,j e 2λt , obtaining, var f (t) = var f (0) e 2λt (S65) Figure S1: Fluctuations within the barcoded L library and relative to S (related to Figure 1A-B).
(A, B) Randomly barcoded libraries of E. coli strains S and L were propagated together in their native serial dilution environment (previously reported data [15]).In the muller plot representation of lineage sizes, we see that the total frequency of L relative to S shows large fluctuations.However, neutral barcoded lineages within L show substantially smaller fluctuations relative to each other.2).We computed the fitness effect of S relative to L across frequency.Consistent with prior measurements [16,17], we find negative frequency-dependent fitness effects.Error bars represent 95% confidence intervals.4).We embedded the overnight trajectories in dimensions 1-3 by using a lag of 1-2 time points.(A) We computed the Lyapunov exponents using the method of Rosenstein et al. [18] for all conditions.(B) In order to choose which set of dimension and lag hyperparameters were most suitable for our data, we performed shuffle splitting cross-validation, and computed an out-of-sample mean squared error (MSE) for all conditions.All error bars represent 95% confidence intervals.
Figure S9: We quantified the rank correlation of replicates with its rank at the final time (related to Figure 4).The increase in correlation in the last 7 hours further shows how the decoupling noise accumulates over time, and that the rank position of each replicate is not "frozen in" earlier in the time course.and N B,1 ∼ N N B,0 , (c 0B − c 1B )N B,0 + c 1B N 2 B,0 .In the simulations, we used c 0A = c 0B = 1 and c 1A = c 1B = 0.01 (and c AB = 0).We defined f = N A /(N A + N B ).We find that the stochastic decoupling noise induces an effective frequency-dependent fitness effect, which agrees well with the theoretical prediction, c 1B (1 − f 0 ) − c 1A f 0 − c AB (1 − 2f ).
Figure S15: Site frequency spectrum with decoupling noise and constant selection.The same as Figure 7B, except where s = 0.02.The blue-green solid lines represent full analytical solutions for dynamics with decoupling noise, across different values of δ.The round markers show simulation results.The black dotted lines represent the case where there are no decoupling noise, but there is still (constant) natural selection.The red dashed lines represents the case where there is neither decoupling noise nor natural selection.We used c 1A = c 1B , ρ AB = 0.5, N e = 10 3 and s = 0.02.

Figure S2 :
Figure S2: Coculture of REL606 and ∆pykF mutant (related to Figure 1D-E).(A) Cocultures were split at day 0 into 8 biological replicates, then propagated separately in the same environment.Flow cytometry measurements were taken every day.(B) We computed the variance between biological replicates, and compared it to the expectation that classical genetic drift is the sole source of population stochasticity, assuming N e = 10 5 .Error bars represent 95% CIs.

Figure S3 :
Figure S3: Inferred power-law exponents for variance and covariance quantities, as a function of mean S frequency (related to Figure 2).Error bars represent 95% confidence intervals, computed via bootstrapping.

Figure S4 :
Figure S4: Ratio of abundance fluctuation strengths between S and L, including the f 2 factor for S (related to Figure2).The ratio is equal to c 1S /c 1L under our model.We see that the ratio is a small factor of order 1, indicating that S and L fluctuate by approximately the same amount.Error bars represent 95% confidence intervals.

Figure S5 :
FigureS5: Frequency-dependent fitness effects of S and L (related to Figure2).We computed the fitness effect of S relative to L across frequency.Consistent with prior measurements[16,17], we find negative frequency-dependent fitness effects.Error bars represent 95% confidence intervals.

Figure S6 :
Figure S6: Correlation between the abundance of S and L (related to Figure2).Our model predicts that the correlation between abundance should be constant when decoupling noise is dominant (ρ AB = c AB / √ c 1A c 1B ).But once classical genetic drift becomes non-negligible, the correlation should decline; this perhaps indicates that the lowest data point is around the cross-over between the drift-and decoupling-dominated regimes.We fit the theoretical curve to our experimental data via ordinary least squares regression.Error bars are standard errors.

Figure S7 :
FigureS7: Comparison of curve fits to the variance of overnight frequency trajectories (related to Figure4).(A) Best fits of various curves to the experimental variance measurements.(B) Akaike Information Criterion (AIC) of fitted models to experimental data, to evaluate goodness-of-fit.We computed FDR-corrected p-values to evaluate if the difference between the AIC of the exponential model compared to the other fits was signficant.The p-values were p =0.018, 0.058, and 0.022 for the comparison to the linear, quadratic, power law models respectively.

Figure S8 :
Figure S8: Cross-validation to compute Lyapunov exponent (related to Figure4).We embedded the overnight trajectories in dimensions 1-3 by using a lag of 1-2 time points.(A) We computed the Lyapunov exponents using the method of Rosenstein et al.[18] for all conditions.(B) In order to choose which set of dimension and lag hyperparameters were most suitable for our data, we performed shuffle splitting cross-validation, and computed an out-of-sample mean squared error (MSE) for all conditions.All error bars represent 95% confidence intervals.

Figure S10 :
Figure S10: Frequency-dependent fitness effects of S and L (related to Figure 5).Scatter points represent change in logit frequency across a single day.Fitted line represents marginal frequencydependent effect from the hierarchical model, described in the main text.

Figure S11 :
Figure S11: Frequency variance scaling in batch 4 in Venkataram et al.[19] data (related to Figure6).Colored dashed lines represent the indicated scaling.Unconstrained power-law fits yields power law exponents of 2.15 ± 0.13 at high frequencies and 1.03 ± 0.08 at low frequencies.

Figure S12 :Figure S13 :
Figure S12: Mean squared displacement estimate computed with different frequency range, from Venkataram et al.[19] data (related to Figure6B).Here, we focused on barcodes from a thinner mean frequency range (compared to the main text), 7•10 −4 to 2•10 −3 .As there were fewer barcodes in this range, the estimates are noisier.Error bars represent 95% CIs.