# An ensemble method for identifying regulatory circuits with special reference to the *qa* gene cluster of *Neurospora crassa*

^{*}Physics and Astronomy and

^{‡}Genetics, University of Georgia, Athens, GA 30602; and

^{†}Department of Biological Sciences, Youngstown State University, Youngstown, OH 44555

^{§}To whom correspondence should be addressed. E-mail: ude.agu@dlonra.

## Abstract

A chemical reaction network for the regulation of the quinic acid (*qa*) gene cluster of *Neurospora crassa* is proposed. An efficient Monte Carlo method for walking through the parameter space of possible chemical reaction networks is developed to identify an ensemble of deterministic kinetics models with rate constants consistent with RNA and protein profiling data. This method was successful in identifying a model ensemble fitting available RNA profiling data on the *qa* gene cluster.

With genome sequencing projects supplying an almost complete inventory of the building blocks of life, functional genomics is now facing the challenge of “re-assembling the pieces” (1, 2). Time-dependent mRNA (3) and protein profiling (4), protein–protein (5–8) and protein–DNA (9) interaction mapping, and the *in vitro* reconstruction of reaction networks (10, 11) are providing insight into the topology and kinetics of a living cell's full biochemical and gene regulatory circuitry. For the first time, it is now possible to place a particular biological circuit like those describing carbon metabolism, transcription, cell cycle, or the biological clock in simple eukaryotes in a larger context, and to examine the coupling of these circuits (12).

New tools in computational biology are needed to identify these reaction networks by using well studied subcircuits like those for carbon metabolism, cell cycle, or the biological clock as a launch point into the entire circuit of a living cell. The *qa* gene cluster of *Neurospora crassa* and the *GAL* gene cluster of *Saccharomyces cerevisiae* in carbon metabolism have served as early paradigms for eukaryotic gene regulation (13, 14) and are prime candidates for taking a genomic perspective on biological circuits. Mechanisms of regulation in the *qa* and *GAL* clusters with their transcriptional activator and repressors are also shared with many other regulatory networks. Because of their relative simplicity, they also provide an opportunity to test new genomic approaches to identifying chemical reaction networks or biological circuits that underlie many fundamental biological processes (15). Three opportunities exist now for identifying and refining biological circuits: the accumulation of transcriptional profiling data (3), a growing number of approaches to modeling gene regulation (11, 15–21), and the ability to carry out the *in vitro* reconstruction of biological circuits with a diversity of emergent properties including bistable (10) and oscillatory activity (11).

However, initially, the profiling data will be scarce and the unknown parameters plentiful. Identification of the parameters in a reaction network is further complicated by the facts that the data are noisy and that our knowledge of the underlying reaction network's topology and of its participating molecular species is incomplete, even in well studied networks like those for the λ-switch, *lac* operon, *trp* operon, or *GAL* cluster. To circumvent these difficulties, we present a statistical modeling approach called the ensemble method of circuit identification, which bases its predictions not on a single poorly parameterized circuit, but rather on a statistical ensemble (22) of “all” such circuit models consistent with existing profiling data. Our approach thus provides quantitative prediction capabilities, and, most importantly, it permits us to guide the design of new experiments, which further constrain the model ensemble, consistent with the profiling data.

In this report, we develop a simple reaction network for the regulation of the *qa* cluster and use RNA profiling information to identify the network. As described (14, 23, 24), the *qa* cluster is composed of five structural genes and two regulatory genes, as shown in Fig. Fig.1.1. The *qa* cluster is induced by shift to quinic acid as the sole carbon source. Three of the genes (*qa-2*, *qa-3*, and *qa-4*) in the cluster encode enzymes involved in the catabolism of quinic acid, ultimately for entry into the Krebs cycle. One gene, *qa-y*, encodes a permease allowing quinic acid into the cell, and another gene, *qa-x*, encodes an unknown function. All seven genes in the cluster are transcriptionally activated by the product of *qa-1F*, and the activator is repressed by the product of *qa-1S*. The cluster is glucose- and sucrose-repressed (23). Our model is based on the gene regulation scheme in Fig. Fig.2,2, which is taken from ref. 23, in which *qa-1F* activates expression of all genes in the *qa* cluster and *qa-1S* represses the activator.

## Materials and Methods

### Strains and Media.

The *N. crassa* wild-type strain 74–*OR*23–1*A* (#987, Fungal Genetics Stock Center, Kansas City, KS) was used in these experiments. RNA was isolated from conidia germinated as shake cultures for 12 h at 25°C on 1.5% sucrose Fries minimal medium (25) and shifted from 0 to 8 h on 0.3% quinic acid Fries medium. All plasmids used to generate probes for RNA hybridization were grown in *Escherichia coli* strain JM101.

### DNA and RNA Isolation.

Plasmid DNA was prepared as in ref. 26. *N. crassa* total RNA was isolated as in ref. 27.

### RNA Hybridization.

Total RNA was fractionated by electrophoresis on agarose gels containing formaldehyde and transferred to Nytran TM, hybridized in 6× SSPE [standard saline phosphate/EDTA (0.18 M NaCl/10 mM phosphate, pH 7.4/1 mM EDTA)]/2× Denhardt's solution/0.1% SDS/50% formamide at 42°C, and rinsed as recommended (Schleicher & Schuell). Six DNA fragments were labeled with ^{32}P (28) from the *qa* cluster (29) and the histone gene (H3) of *N. crassa* as a standard (30). Densitometric analysis of autoradiograms was performed on the Molecular Dynamics 300A. Northern blots were stripped in 50% formamide/2× SSPE at 65°C, as recommended (Schleicher & Schuell), and reprobed.

## Biological Circuit Model

Chemical reaction networks have been proposed for the λ phage switch (31), signaling networks (15), the cell cycle in *S. cerevisiae* (32), and carbon metabolism (21, 33). Our model is a chemical reaction network based on a rate equation framework (15), as follows:

Here, *m*_{x} and *p*_{x} denote the mRNA and protein product concentration of gene **x**, where **x** stands for the activator *F* (≡ *qa*-1*F*), the repressor *S* (≡ *qa*-1*S*), and the five structural genes **sg** (≡ *qa*-*x*, *qa*-*y*, *qa*-2, *qa*-3, and *qa*-4). All message levels *m*_{x} are assumed to decay with the same rate constant 1/τ and the model time *t*, and all *m*_{x} and *p*_{x} have been rescaled so that all rescaled translation and mRNA decay rate constants are unity; hence, the dimensionless time *t* in Eq. 1 is related to the physical time *t*^{(phys)} by *t* = *t*^{(phys)}/τ. Possible *qa* cluster activation by additional, not explicitly included promoter species (14) is modeled by constant basal transcription rates, with rescaled rate coefficients denoted by α_{x}. The rescaled protein decay rates are given by β_{x}. We assume that the concentration of free inducer molecules *Q*, i.e., quinic acid, decays exponentially according to *Q*(*t*) = *Q*_{0}*exp*(−κ*t*) or is constant, κ = 0, over time *t*, where κ is the rescaled decay constant and *Q*_{0} is the initial concentration of inducer in the media. The rate of transcription is proportional to the level of inducer and activator protein, with rescaled rate constants denoted by δ_{x}. The repressor interacts with the activator, and the effect of the repressor on transcriptional activation is captured in the repressor effects γ_{x}. Transcription of the repressor gene is assumed to be unrepressed, i.e., we set γ_{S} = 0. The Hill exponent *n* is a shape parameter controlling the cooperative effect of the repressor on transcription rates. The model does not include posttranscriptional regulation.

Steady-state solutions of Eq. 1 for constant inducer in the media (i.e., κ = 0 and *Q* = *Q*_{0}) can be found and local stability analysis performed. For a Hill exponent of *n* = 1, the steady state is

where *p* is the unique, positive, stable solution to *Ap*^{2} + *Bp* + *C* = 0, with *A* = γ_{f1}β_{F}δ_{S}*Q*_{0}; *B* = β_{F} + α_{S}β_{F}γ_{f1} − γ_{f1}δ_{S}α_{F}*Q*_{0} − δ_{F}*Q*_{0}; *C* = −α_{F} − α_{F}α_{S}γ_{f1}; and γ_{f1} = γ_{F}/β_{S}.

## Ensemble Method for Identifying Kinetics Models

As in the study of most biological circuits, for the foreseeable future, biologically realistic models are likely to be parameter rich and data poor, even with the advent of RNA and protein profiling. The approach we take to sidestep this problem is one drawn from statistical mechanics (34) and using Monte Carlo (MC) simulation methods (35–38), which have found increasingly wide application in biology (39). Instead of trying to identify one model, the goal is to identify an ensemble of models consistent with, and constrained by, the available RNA and protein profiling data based on MC simulation techniques. This is termed the *ensemble method* for circuit identification. We will first give a simplified overview of the method, followed by a full technical description.

Suppose the model and its solution are completely specified by a certain array of parameters that are initially unknown, but are to be constrained by the available experimental data. This array of unknown model parameters is referred to below as the “model parameter vector Θ,” or, for short, the “model Θ.” The basic idea of the ensemble method is to generate an “experimentally constrained” random sample of such Θs, in such a manner that those Θs that yield model predictions “most consistent” with the experimental data are the most likely to be collected into the sample. A model's “degree of consistency” with the experimental data is quantified in terms of a certain figure of merit that measures “how closely” the model's prediction for observed quantities matches the experimental data.

The ensemble simulation starts from an initial Θ that is chosen completely randomly, i.e., without any constraint by experimental data. The simulation then proceeds as a random walk in the “space” of all possible Θ, as follows: From the random walk's current Θ location in the model parameter space, a new Θ is constructed by a certain random “proposal” procedure. If the proposed new Θ improves the figure of merit, it is automatically accepted and becomes the next Θ point visited by the random walk. If the proposed new Θ worsens the figure of merit, the proposal is accepted with a certain probability, *P*_{accept}, <1 or rejected with probability 1 − *P*_{accept}. If accepted, the proposed new Θ becomes the next point visited by the random walk; if rejected, the next point visited is identical to the current point, i.e., the random walk does not move. Eventually, this random walk settles into a steady state where almost all Θs visited are consistent with the experimental data. A large sample of such Θ vectors, visited by the random walk in steady state, represents the model ensemble.

We now turn to the full technical description of the ensemble method. Let the unknown parameters in the model be denoted by the *M*-tuple Θ := (Θ_{1}, . . . , Θ_{M}). For the kinetics models explored here, Θ comprises the unknown rate coefficients for all reactions *r* = 1, 2 , . . . , *M _{R}*, e.g., in Eq. 1, and all unknown initial concentrations [

*s*]

_{t=0}for all intracellular species

*s*= 1, 2, . . . ,

*M*. Our desired ensemble is then formally described in terms of a probability distribution

_{S}*Q*(Θ) on the Θ space of all models.

Next, let *Y* := (*Y*_{1}, . . . , *Y _{D}*) denote the

*D*-tuple of all experimental observables, which have been measured in one or a series of

*M*time-dependent profiling experiments, labeled by

_{E}*e*∈ {1, . . . ,

*M*}, where, in each experiment the concentrations [s] of certain species

_{E}*s*are measured at time points

*t*. Different experiments

*e*are distinguished by differing externally controlled and quantitatively known experimental conditions that include, for example, the carbon source and their concentrations, feeding/starvation schedules, choice of measurement time points, and functional presence or absence of certain genes or proteins, as controlled by gene knockout or enzyme inhibition experiments. If, for example, some linear measure of concentration is used, our data vector

*Y*would comprise components

with some (known or unknown) reference concentration [*s*]^{(ref)} like, e.g., the maximum [*s*] level during observation. Alternatively, we may want to use a log-concentration measure (3), *Y _{l}* :=

*Y*

_{s,t,e}:= ln([

*s*]

_{t,e}/[

*s*]

^{(ref)}). Here

*l*:= (

*s*,

*t*,

*e*) and

*s*∈

*S*′ labels the

*M*different molecular species, with

_{S′}*S*′ denoting the subset of all species whose time-dependent concentrations actually have been observed. Typically,

*S*′ is only a subset (generally a small one!) of the set

*S*of all

*M*participating species in the network. With

_{S}*t*∈ (

*t*

_{1}, . . . ,

*t*) labeling the

_{MT}*M*different time points at which concentration measurements have been taken, the dimensionality of our data vector

_{T}*Y*is then

Now, let *F*(Θ) := (*F*_{1}(Θ), . . . , *F _{D}*(Θ)) denote the corresponding vector of predicted values for the observables

*Y*in a given model Θ. For the above-described set of observables

*Y*

_{s,t,e}, the predicted values

*F*

_{l}(Θ) ≡

*F*

_{s,t,e}(Θ) are calculated from Θ by solving the circuit's system of rate equations for the rate coefficients and initial conditions comprised in Θ and by then extracting from that solution the linear or log-concentration measure for each observed species

*s*at each observation time point

*t*in each experiment

*e*.

It is reasonable to assume (but not fundamental to our ensemble method!) that the probability distribution *P*(*Y*|μ) of the data *Y*, given their corresponding mean values μ = (μ_{1}, … , μ_{D}), is representable as a multivariate Gaussian, without error correlations between different data points *Y _{l}*. Hence, we will use the following

with μ_{l} and σ_{l} denoting the mean and standard deviation of the observable *Y _{l}*. If multiple realizations of each profiling experiment are performed, then the full variance–covariance matrix can be estimated and used in Eq. 5 in lieu of σ. Based on prior experience with Northern blots, we assume relative standard deviations σ

_{l}/μ

_{l}∼ 0.2 − 0.3 in the simulations reported below. Heteroscedasticity has been reported not to be an issue (40).

A given *P*(*Y*|μ) does of course not uniquely determine the model ensemble *Q*(Θ). There is an infinite manifold of *Q*(Θ) that is consistent with the data distribution *P*(*Y*|μ), and we have to make reasonable choices. The simplest choice, which we have adopted here, is to take the likelihood as the posterior (with uniform prior) distribution (41), i.e.,

with a weight *W*(Θ) := *P*(*Y*|*F*(Θ)) and normalization factor Ω := ∑_{Θ}*W*(Θ), where ∑_{Θ} denotes integration over all components of Θ. To emphasize the analogy to the Boltzmann factor in statistical physics, we have also introduced here the analogue of a Hamiltonian or energy function, *H*(Θ) := −ln *W*(Θ) (34). More systematic approaches to constructing *Q*(Θ) can also be used, e.g., a posterior probability derived from Bayesian inference and maximum entropy considerations (41–44). For the present proof-of-principle applications, we will limit ourselves to the choices for *P* and *Q* given above.

In standard data-fitting methods, such as maximum likelihood, least-squares fitting, and maximum entropy approaches, one would attempt to construct *the* correct model by finding a unique Θ that maximizes *Q*(Θ). Because of the large number of unknown model parameters, the (initial) scarcity of experimental data, and the substantial uncertainties in the data, such approaches are bound to fail in the present context. Our basic philosophy here is that one should not attempt to find a unique Θ, unless it is warranted by the quantity and quality of the underlying data. Rather, one should admit all Θ as possible candidates for the correct model with a probability distribution *Q*(Θ) that reasonably reflects a Θs degree of consistency with the data. The weight *W*(Θ) provides a convenient measure of the degree of consistency of the model Θ with the experimental data, and, thus, serves as our figure of merit.

For any ensemble of the general form *Q*(Θ) := Ω^{−1}*W*(Θ) with an analytically known or numerically calculable weight function *W*(Θ) [having a normalization Ω = ∑_{Θ}*W*(Θ)], we can evaluate the ensemble average of *any* quantity *G*(Θ),

as well as, for example, its probability distribution *p*_{[G,Q]}(*g*): = 〈δ(*g* − *G*(.))〉_{[Q]}. This is achieved by a well established MC method from statistical physics (35–38) in which random samples of Θ = (Θ^{1}, . . . ,Θ^{I}), distributed according to *Q*(Θ) are generated numerically, e.g., by a Metropolis-type (35–38) random walk Markov chain procedure. The desired expectation 〈*G*(.)〉_{[Q]} is then given, up to controllable statistical sampling errors, by

over the MC sample, i.e., by the Ergodic Theorem, lim_{I→∞} 〈*G*(.)〉_{I} = 〈*G*(.)〉_{[Q]}. Specifically, in our simulations, the basic random updating step in our Markov chain, from a given Θ to the next, Θ^{+}, proceeds as follows: (*i*) select with equal probability one of the Θ components, Θ_{m}, with *m* ∈ {1, . . . , *M*}; (*ii*) propose an update from Θ_{m} to Θ′_{m} := Θ_{m} + Δ_{m}, where Δ_{m} is drawn with constant probability from an interval [−Δ, Δ] with some maximum step width Δ; (*iii*) accept the proposed step with the standard Metropolis acceptance probability *P*_{accept}(Θ → Θ′) = min[1, *Q*(Θ′)/*Q*(Θ)], where Θ′ := (Θ_{1}, . . . , Θ′_{m}, . . . , Θ_{M}). If the proposed step to Θ′ is accepted, set the “next” Θ in the Markov chain Θ^{+} = Θ′; else, Θ^{+} = Θ.

A crucial point here is that only the weight function *W*(Θ), but not the normalization factor Ω, needs to be evaluated in generating the MC sample, because only ratios *Q*(Θ′)/Q(Θ) = *W*(Θ′)/*W*(Θ) enter into the Metropolis acceptance probability *P*_{accept}. Each such updating step *does* require a completely new solution of the reaction network model to evaluate the new weight *W*(Θ′) for the proposed new Θ′.

In the following, we will apply the ensemble approach to the kinetics model, Eq. 1, where Θ ≡ (Θ_{1}, . . ., Θ_{M}) comprises (*i*) the initial concentrations *m*_{x,0} and *p*_{x,0} of the seven mRNA and seven protein species; (*ii*) the quinic acid initial concentration *Q*_{0} and decay rate constant κ; and (*iii*) the rate coefficients α_{x}, β_{x}, γ_{x}, and δ_{x}, where the (fixed) γ_{S} = 0 is excluded and, for the five structural genes, all five β_{x} are set to the same value β_{sg}, because the **sg** proteins do not act back on any other species in our model and, hence, β_{sg} does not affect the model predictions for any measured mRNA species. Assuming a fixed Hill exponent *n* = 1 and a given, fixed mRNA lifetime τ = 60 min, there are, hence, 25 unknown rate constant parameters and 14 initial conditions, i.e., a total of *M* = 39 Θ-variables, in our model, to be fitted to only *D* = 42 data points (*M*_{S′} = 6 mRNAs × *M _{T}* = 7 time points ×

*M*

_{E}= 1 experiment) from Fig. Fig.3.3.

*qa*cluster together with that of the histone (H3) as a control at 30 min, 60 min, 90 min, 2 h, 4 h, 6 h, and 8 h. Sizes of some messages are indicated on the right.

The model Eq. 1 was solved using the fourth-order Runge-Kutta method (45) with time step *h* = 3.0 min to compute the model solution *F _{l}*(Θ) in Eq. 6. For

*Y*in Eq. 6, we used the linear concentration measures given by the pixel-count data extracted from Fig. Fig.3,3, with the maximum mRNA level as the reference [

_{l}*s*]

^{(ref)}in Eq. 3 for each measured mRNA species

*s*. At the beginning of the MC random walk, Θ was randomly initialized, with each component Θ

_{m}drawn from a wide but finite interval

*I*:= [Θ, Θ]. A typical walk in Θ space consisted then of 10

_{m}^{4}MC “warm-up” steps, to equilibrate the Markov chain, followed by 10

^{4}-step MC accumulation steps, with all components of Θ and all corresponding solutions for the time-dependent species concentrations sampled after every 10

^{2}steps.

## Results

In experimental studies of the mRNA levels of the *qa-1F* gene, the level of activator mRNA increases for 4 h, decreases slightly by 8 h, and continues to decrease noticeably by 10 h (Fig. (Fig.3;3; ref. 24). At 10 h, the message level is 38% of that at 4 h. The qualitative mRNA dynamics of the *qa* structural genes is quite similar to the dynamics of *qa-1F* in Fig. Fig.3.3. At present, there are no measurements of the mRNA levels of *qa-1S*. Using the ensemble method described in the previous section, we find that there is a set of models Θ that captures this gene expression dynamic quite well.

The ensemble *Q*(Θ) represents a complex object in a high-dimensional parameter space. In Fig. Fig.4,4, we show projections of the ensemble into three arbitrarily chosen 2D Θ planes, displayed as scatter plots of an MC sample. Important interrelations between the parameters can be revealed by such projections. For example, the ensemble appears quite constraining for the rates of protein turnover and induction, as shown in the second and third plane. The basal rates of transcription, shown in the first plane, are much more diffuse and correlated.

In Fig. Fig.55 we show ensemble averages of the numerical solutions of Eq. 1, obtained from the MC sample whose projections are shown in Fig. Fig.4.4. The dots in Fig. Fig.55 are the experimental data derived from Fig. Fig.3.3. The shaded areas at each time *t* are centered around the ensemble averages 〈[*s*]_{t,e}〉 of the respective species concentrations [*s*]_{t,e} and comprise 4 ensemble standard deviations of [*s*]_{t,e}. As Fig. Fig.55 shows, with a few exceptions, these shaded “ensemble” areas cover the experimental data, shown by the dots. Fig. Fig.55 also shows some of the corresponding ensemble predictions for as-yet-unobserved protein time evolutions.

From a microscopic point of view, chemical reactions proceed by a stochastic process involving discrete, random collision events between discrete molecules (or discrete quasimolecular entities such as the gene activator binding sites on a chromosome). A deterministic model like Eq. 1 captures this stochastic dynamics only approximately (46), at the level of statistical averages, thereby neglecting fluctuation effects arising from the discreteness of molecules and molecular collisions. Such fluctuation effects can be important in systems where the total number of molecules of a species is small. For example, it has been shown that fluctuation effects arising in the stochastic dynamics of gene expression (47, 48) may provide an explanation for the phenomenon of phenotypic switching. Binding of a free inducer molecule (i.e., quinic acid in the cell), activator, and repressor to the activator, gene, or repressor, is likely to be a stochastic process subject to substantial fluctuations, because of the small number reactant molecules in the cell (24, 49).

Following ref. 46, we can construct a *corresponding* stochastic model based on the deterministic model Eq. 1, the circuit in shown Fig. Fig.2,2, and the parameter estimates provided by the above-described MC sample of model Θ vectors. In this corresponding model, the number of molecules of each species is treated as a stochastically evolving integer, and time is advanced in discrete steps of random lengths, from one collision event to the next, with a time step length distribution determined by the molecular collision rates. The resulting stochastic model has the structure of a discrete-time denumerable Markov chain (50).

Different realizations of the random trajectories of such a stochastic model are shown in Fig. Fig.6.6. The results in Fig. Fig.66 indicate that the dynamics of the total number of molecules, e.g., for the *qa*-1*F* gene product, *N*_{mF}, behave qualitatively like the experimentally observed and also like the above-described deterministic model dynamics, with maximal expression being reached at ≈4 h. The fact that both the deterministic (Eq. 1) and stochastic models show dynamics in accordance the with experimental data suggests that the gene regulation scheme shown in Fig. Fig.22 is indeed the key molecular mechanism in the gene regulation of the *qa* cluster.

## Discussion

Simple gene regulation schemes like the one in Fig. Fig.22 are at the core of more complex biological circuits, and they represent a model of how we can think of the functioning of the cell. From such a circuit, a formal model (deterministic or stochastic) can be derived and compared with the temporal dynamics of the RNAs and proteins in the cell. Traditional fitting practices for kinetics models are unlikely to be successful because the models are parameter rich, but the scientist is data poor when the information is derived from profiling experiments. Borrowing from Boltzmann's original ideas in statistical mechanics (34), we are taking the approach of identifying an *ensemble* of models, rather than trying to find one or a few solutions to an ill conditioned fitting problem. To implement this approach computationally, we initiate a random walk (in particular, the realization of a Markov chain) in the model parameter (Θ) space, guided by some figure of merit that quantifies the deviation of the model from the data. Once this random walk settles into a steady state, typically after a few thousand steps in the parameter space, the ensemble of models is realized by the steady state or stationary walk consistent with the data (Fig. (Fig.4).4). This ensemble (or distribution of fitted models on the parameter space) captures what we know about the biological circuit. This ensemble method of circuit identification allows us to see not only what parameters in the model are well specified (or poorly specified) by providing higher moments (i.e., variances) and (joint) distributions, but it also provides us insights into what are likely to be the most informative new experiments to reduce the uncertainty in our model specification. For example, the basal transcription rates shown in Fig. Fig.44 are poorly constrained by the present experimental data, suggesting the need for additional early-time (<30 min) measurements.

We have successfully used the ensemble method to model profiling data on one of the classic eukaryotic biological circuits, the *qa* gene cluster. The model Eq. 1 is only the first step for a model of gene regulation in the *qa* cluster designed to explain existing data. As more profiling data are obtained under a variety of external control conditions, other features could be added to the model, including, for example, coupling to other circuits, such as aromatic amino acid biosynthesis (33, 51). Coupling of such circuits can lead to a richer repertoire of circuit dynamics (15). It has also been argued that translational control may play an important role in the *qa* gene cluster regulation (14), a feature not included in the current model. The model summarized in Eq. 1 can be extended to study this and other potential complications to yield experimental predictions about possible emergent properties in the biological circuit (15). For example, under some conditions (49, 52), stochastic simulation of this scheme may show oscillatory dynamics or switch-like behavior. This raises the question of whether or not deterministic kinetics models could be used to predict conditions for an oscillatory response of the *qa* cluster.

Biological circuits such as the *qa* cluster can be perturbed in a variety of ways. Profiling data will be obtained under varying and/or time-dependent sucrose and QA levels, and in circuits modified by gene knockout and/or enzyme poisoning. By means of a single, joint χ^{2} function, all such perturbation experiments can be immediately incorporated into the ensemble approach and treated on equal footing, thus leading to systematic refinements and extensions of the circuit model. Extension of the ensemble approach to stochastic reaction kinetics modeling (46) is in principle straightforward. The ensemble method of model identification provides us with a versatile tool, allowing direct inspection of whether a new model “works” in adequately representing the data. A flexible and adaptable fitting tool is key to a detailed understanding of biological circuits because the resulting ensemble models can be used to guide the construction of new costly experiments to extend our understanding of particular real circuits in a genomic context and to maximize the likely information gain from such new experiments.

## Abbreviations

- MC, Monte Carlo

## References

*et al.*(2001) Science 291

**,**1304-1351. [PubMed]

**,**860-921. [PubMed]

**,**680-686. [PubMed]

**,**994-999. [PubMed]

*et al.*(2000) Nature 403

**,**623-627. [PubMed]

**,**1143-1147. [PMC free article] [PubMed]

*et al.*(2002) Nature 415

**,**141-147. [PubMed]

*et al.*(2002) Nature 415

**,**180-183. [PubMed]

*et al.*(2000) Science 290

**,**2306-2309. [PubMed]

**,**339-342. [PubMed]

**,**335-338. [PubMed]

**,**712-728.

**,**458-476. [PMC free article] [PubMed]

**,**15-37. [PubMed]

**,**381-387. [PubMed]

**,**167-186. [PubMed]

**,**209-216. [PubMed]

**,**199-224. [PubMed]

**,**2075-2080. [PMC free article] [PubMed]

**,**1693-1698. [PMC free article] [PubMed]

**,**27-36. [PubMed]

**,**534-547. [PubMed]

**,**343-348. [PubMed]

**,**3593-3599. [PMC free article] [PubMed]

**,**79-143.

**,**2989-2998. [PMC free article] [PubMed]

**,**21.

**,**266-267. [PubMed]

*et al.*(2001) Genetics 157

**,**979-990. [PMC free article] [PubMed]

**,**5347-5360. [PMC free article] [PubMed]

**,**7865-7870. [PMC free article] [PubMed]

**,**73-88. [PubMed]

**,**1647-1657. [PubMed]

**,**8961-8965. [PMC free article] [PubMed]

**,**133-195.

**,**2340-2361.

**,**814-819. [PMC free article] [PubMed]

**,**1633-1648. [PMC free article] [PubMed]

**,**3116-3136. [PMC free article] [PubMed]

**,**337-348. [PubMed]

**,**267-268. [PubMed]

**National Academy of Sciences**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (281K) |
- Citation

- Comparative studies of the quinic acid (qa) cluster in several Neurospora species with special emphasis on the qa-x-qa-2 intergenic region.[Mol Gen Genet. 1991]
*Asch DK, Orejas M, Geever RF, Case ME.**Mol Gen Genet. 1991 Dec; 230(3):337-44.* - Genetic organization and transcriptional regulation in the qa gene cluster of Neurospora crassa.[Proc Natl Acad Sci U S A. 1981]
*Patel VB, Schweizer M, Dykstra CC, Kushner SR, Giles NH.**Proc Natl Acad Sci U S A. 1981 Sep; 78(9):5783-7.* - Gene organization and regulation in the qa (quinic acid) gene cluster of Neurospora crassa.[Microbiol Rev. 1985]
*Giles NH, Case ME, Baum J, Geever R, Huiet L, Patel V, Tyler B.**Microbiol Rev. 1985 Sep; 49(3):338-58.* - The Wilhelmine E. Key 1989 invitational lecture. Organization and regulation of the qa (quinic acid) genes in Neurospora crassa and other fungi.[J Hered. 1991]
*Giles NH, Geever RF, Asch DK, Avalos J, Case ME.**J Hered. 1991 Jan-Feb; 82(1):1-7.* - Autogenous regulation of the positive regulatory qa-1F gene in Neurospora crassa.[Mol Cell Biol. 1985]
*Patel VB, Giles NH.**Mol Cell Biol. 1985 Dec; 5(12):3593-9.*

- A continuous optimization approach for inferring parameters in mathematical models of regulatory networks[BMC Bioinformatics. ]
*Deng Z, Tian T.**BMC Bioinformatics. 15(1)256* - Approximate Bayesian computation schemes for parameter inference of discrete stochastic models using simulated likelihood density[BMC Bioinformatics. ]
*Wu Q, Smith-Miles K, Tian T.**BMC Bioinformatics. 15(Suppl 12)S3* - A MINE Alternative to D-Optimal Designs for the Linear Model[PLoS ONE. ]
*Bouffier AM, Arnold J, Schüttler HB.**PLoS ONE. 9(10)e110234* - A Bayesian active learning strategy for sequential experimental design in systems biology[BMC Systems Biology. ]
*Pauwels E, Lajaunie C, Vert JP.**BMC Systems Biology. 8102* - A mathematical model of intrahost pneumococcal pneumonia infection dynamics in murine strains[Journal of theoretical biology. 2014]
*Mochan E, Swigon D, Ermentrout GB, Lukens S, Clermont G.**Journal of theoretical biology. 2014 Jul 21; 35344-54*

- CompoundCompoundPubChem chemical compound records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records. Multiple substance records may contribute to the PubChem compound record.
- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles
- SubstanceSubstancePubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.
- TaxonomyTaxonomyTaxonomy records associated with the current articles through taxonomic information on related molecular database records (Nucleotide, Protein, Gene, SNP, Structure).
- Taxonomy TreeTaxonomy Tree

- An ensemble method for identifying regulatory circuits with special reference to...An ensemble method for identifying regulatory circuits with special reference to the qa gene cluster of Neurospora crassaProceedings of the National Academy of Sciences of the United States of America. 2002 Dec 24; 99(26)16904

Your browsing activity is empty.

Activity recording is turned off.

See more...