- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC2749294

# Mixed-integer nonlinear optimisation approach to coarse-graining biochemical networks

^{1}S.J. Bornheimer,

^{2,}

^{3}V. Venkatasubramanian,

^{4}and S. Subramaniam

^{1,}

^{2,}

^{3,}

^{5}

^{1}Department of Bioengineering, University of California, San Diego, 9500 Gilman Dr La Jolla, CA 92093, USA

^{2}Departments of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Dr La Jolla, CA 92093, USA

^{3}Cellular and Molecular Medicine, University of California, San Diego, 9500 Gilman Dr La Jolla, CA 92093, USA

^{4}Laboratory for Intelligent Process Systems, School of Chemical Engineering, Purdue University, West Lafayette, IN 47907, USA

^{5}Graduate Program in Bioinformatics, University of California, San Diego, 9500 Gilman Dr La Jolla, CA 92093, USA

## Abstract

Quantitative modelling and analysis of biochemical networks is challenging because of the inherent complexities and nonlinearities of the system and the limited availability of parameter values. Even if a mathematical model of the network can be developed, the lack of large-scale good-quality data makes accurate estimation of a large number of parameters impossible. Hence, coarse-grained models (CGMs) consisting of essential biochemical mechanisms are more suitable for computational analysis and for studying important systemic functions. The central question in constructing a CGM is which mechanisms should be deemed ‘essential’ and which can be ignored. Also, how should parameter values be defined when data are sparse? A mixed-integer nonlinear-programming (MINLP) based optimisation approach to coarse-graining is presented. Starting with a detailed biochemical model with associated computational details (reaction network and mathematical description) and data on the biochemical system, the structure and the parameters of a CGM can be determined simultaneously. In this optimisation problem, the authors use a genetic algorithm to simultaneously identify parameter values and remove unimportant reactions. The methodology is exemplified by developing two CGMs for the GTPase-cycle module of M1 muscarinic acetylcholine receptor, Gq, and regulator of G protein signalling 4 [RGS4, a GTPase-activating protein (GAP)] starting from a detailed model of 48 reactions. Both the CGMs have only 17 reactions, fit experimental data well and predict, as does the detailed model, four limiting signalling regimes (LSRs) corresponding to the extremes of receptor and GAP concentration. The authors demonstrate that coarse-graining, in addition to resulting in a reduced-order model, also provides insights into the mechanisms in the network. The best CGM obtained for the GTPase cycle also contains an unconventional mechanism and its predictions explain an old problem in pharmacology, the biphasic (bell-shaped) response to certain drugs. The MINLP methodology is broadly applicable to larger and complex (dense) biochemical modules.

## 1 Introduction

Biochemical systems are nonlinear and complex because of the presence of multiple species, reactions and their complexes [1]. Computational analysis of these systems in full detail is impractical because all parameter values must be known, which is rarely true. Parameter estimation can overcome this limitation to some extent, but both computational complexity and inaccuracy (uncertainty) of estimated parameters increase with the number of unknown parameters. An equally important reason for simplifying detailed models of complex networks is the inability to effectively interpret the myriad predictions that can be made about perturbations of a complex system. With the availability of a limited number of readouts (measurements) from the system that can be used to estimate the parameters, many predictions may be unrealisable with respect to unmeasurable states (components), possibly leading to inaccurate predictions of some of the measurable states. Such states could be component concentrations or derived quantities such as total concentration of a protein in its active state. Clearly, there is a need to develop methods to reduce the size and complexity of computational models of biochemical networks while retaining functional features and predictive accuracy. This can often be done by decomposing complex biochemical networks into distinct biochemical modules [2, 3] that can be studied in isolation and then reconnected to recreate large networks. These modules may consist of proteins that interact on similar time scales and in similar sub-cellular locations to regulate defined functions (functional decomposition) [2, 4–6]. The modules themselves can be complex, often because of combinatorial reactions between three or more species, in which case simplification of the modules is desirable. For instance, in our previous study, a detailed model of the GTPase cycle consisted of 48 reaction parameters, and was simplified to 17 parameters [7]. We present here a novel method for coarse-graining highly connected, complex modules. The method is also extensible for coarse-graining large networks which are not highly modular.

The generation of coarse-grained models (CGMs) via order reduction for linear systems is well studied [8]; but for nonlinear systems, including most biological systems, model simplification is not well studied and is not as straightforward as for linear systems [5, 9]. The main factors leading to complexity in biological systems are the presence of multiple reactions and processes, multiple time scales and many species. Based on Tikhonov's theorem [10], a well-known principle of model reduction in this context is to eliminate biochemical processes that are very fast (use quasi-steady-state approximation) or very slow (assume constant) compared with the characteristic time scale of interest of a biochemical system [11]. For other biochemical processes, time-scale decomposition is still suitable because nonlinearities can be handled by making algebraic approximations for the fast-evolving states [12, 13]. A succinct review of the existing methods for model simplification is provided in our previous study [7, 14]. These methods include lumping, sensitivity analysis and time-scale analysis [13], state-space reduction methods [15], optimisation-based parameter and/or state elimination approaches using a genetic algorithm (GA) [16] or integer programming [17, 18]. In the optimisation-based approaches, the aim is to eliminate the maximum number of state variables and reactions or parameters subject to the constraint that the remaining model still fits the desired experimental data and satisfies other required constraints. Some examples of constraints are (1) thermodynamic constraints imposed on rate constants involved in thermodynamic cycles (second law of thermodynamics), (2) maximum values of rate constants as dictated by diffusion limits, (3) constraints (e.g. on rate constants) gleaned from prior experiments available in the literature and (4) constraints on the maximum and minimum values of data to reflect noise or error. All these methods assume that the values of the parameters are known. The traditional state-space reduction is a powerful approach [15] but is mature enough only for linear systems. Further, a biochemist cannot easily relate the states of the reduced network model to the states (e.g. concentrations of components) in the detailed network model [13]. For complexity reduction, state elimination is more useful but is more challenging, whereas reaction (connection or mechanism) elimination is easier but is less effective for whole networks [16–18]. In the state-space reduction approach, new states are derived as a linear combination of the states in the original detailed model [13, 15]. It is possible that some of the original state variables are eliminated altogether. Because of the linear combination step, they are suitable for simplifying models of linear systems and less applicable for simplifying nonlinear systems. For nonlinear systems, an ideal state-space reduction approach would require nonlinear transformation, which is difficult owing to the numerous possibilities on the types (and order in case of polynomials) of nonlinear functions. For biological systems, which are inherently nonlinear, an elegant choice can be the use of Michalis–Menten or Hill-dynamics type of flux expressions to lump several elementary reactions in series [19]. One recent approach to model simplification has focused on modularity and uses a state-space reduction-based methodology while placing considerable emphasis on the measurable outputs of the model [20, 21]. For complex modules, reaction elimination is still effective, especially if the number of nodes and reactions is large because of the combinatorial increase in the number of proteomic states. Recently, *Maurya et al.* [7] presented a multiparametric variability analysis (MPVA) based approach in which unknown parameters could be estimated using experimental data, and unimportant reactions could be eliminated through an importance-based ranking of the reaction parameters. However, the method is recursive and hence time consuming in that the size of the model is reduced in several iterations by eliminating a few parameters in each round. In addition, the parameter-elimination space is not searched well even in a pseudo-global sense.

To avoid the iterations involving manual knockout of the parameters and to better search the parameter and network structure space, a mixed-integer nonlinear-programming (MINLP) based methodology for model reduction is presented here. In this method, the structure (e.g. reaction network) and the parameters of the CGM are determined simultaneously by solving the optimisation problem using a GA [22]. Therefore the method simultaneously identifies the set of reactions that should be retained in the biochemical network and the associated parameter values required to satisfy the experimental data and other user-specified constraints.

The basic philosophy is still the elimination of reactions. MINLP builds upon an approach whereby Edwards *et al.* [16] used a GA on an integer program to eliminate reactions but assumed that parameter values were known. Simultaneously eliminating reactions and optimising parameters in MINLP has a distinct advantage over integer programming-based methods. Since the estimated values of parameters for a detailed model are only approximations of the corresponding true values, during coarse-graining, the values of the retained parameters should not be exactly equated to their estimated values for the detailed model. Instead, the values of such parameters could be constrained around their estimated values for the detailed model. By doing so, the MINLP methodology can determine a model that, compared with the models identified by the technique of Edwards *et al.* or similar integer programming techniques [16–18], has the same topology but an improved fit to data or, in the best case, a smaller network topology.

The contribution of the research presented here is 2-fold: (1) to develop an MINLP-based novel approach to coarse-graining and (2) to show, through a simple but biologically meaningful case study, how coarse-graining can help gain insights into and generate hypotheses for complex biochemical mechanisms. In the next section, the MINLP framework is presented. In the following section, the application of MINLP for coarse-graining with a detailed model of a GTPase cycle signalling module recently published by Bornheimer *et al.* [23] is presented. The module is complex because of the combinatorial reactions between the receptor, G-protein and GTPase-activating protein (GAP), including 48 reactions. Using the MINLP method, the module was coarse-grained to 17 reactions. An alternative CGM with 17 reactions was also obtained. The two CGMs highlight two distinct mechanisms of the GTPase cycle. The current CGMs are improved over the CGMs obtained earlier manually or through our MPVA approach [7].

## 2 Methods: simultaneous determination of network topology and estimation of parameters

In a general parameter-estimation problem, model parameters (*p _{i}*) are estimated by minimising the fit error between experimental data and model predictions while satisfying appropriate constraints. For model reduction, the parameter-estimation problem is extended by including binary variables (

*u*) to indicate whether or not a reaction is retained in the CGM. The key idea is to substitute each parameter, say

_{i}*p*, by the expression

_{i}*u**

_{i}*p*, and then to minimise a suitable objective function with respect to both

_{i}*p*and

_{i}*u*.

_{i}*u*= 1 or 0 imply that the parameter is retained or eliminated, respectively. The resulting optimisation problem is a mixed-integer nonlinear program.

_{i}Complex expressions in which some parameters should be retained or be eliminated simultaneously can be handled by introducing appropriate constraints on a case-by-case basis. As an example, in a Michaelis–Menten flux expression, both *V*_{max} and *K*_{M} should be either retained or eliminated together. The essential idea is to focus on mechanisms. If only *V*_{max} is removed, then it means that only the reaction ‘conversion of the enzyme-substrate complex into the product and the enzyme’ is removed. Although this is legitimate from a theoretical standpoint, it is a meaningless form of biochemical model reduction. Similarly, if a Hill-dynamics mechanism is to be removed, then all the three parameters (*V*_{max} and *K*_{M} and the Hill coefficient) should be removed. In general, binary variables should only be directly multiplied with parameters such as *V*_{max}, *k*_{cat} or *k*_{f} and *k*_{b}, and not with *K*_{M} and special care should be taken with complex reaction terms, discontinuous functions or any expression that involves the multiplication of parameters where factorisation is not possible. These ‘lumped’ mechanisms are already coarse-grained descriptions of reactions. The MINLP methodology can treat all these situations, but it is most suitable for reducing modules comprising detailed reaction mechanisms. This also makes the translation of mathematical results into the network structure transparent.

The objective function of the MINLP is composed of two terms: (1) the number of retained parameters and (2) an expression to reflect the fit error so as to differentiate between CGMs with an equal number of retained parameters but different structures. The procedure is as follows.

Input: (1) The mathematical model (**Ω**) comprising the model equations and (2) the optimisation problem for parameter estimation consisting of the fit error as the objective function, *e*({*p _{i}*},

**Ω**), and constraints (see below).

- Given the mathematical model (
**Ω**), substitute each parameter*p*(e.g._{i}*V*_{max},*k*_{cat}or*k*_{f}and*k*_{b}) by*u**_{i}*p*being a binary variable. The fit error becomes a function of {_{i}, u_{i}*p*}, {_{i}*u*} and_{j}**Ω**. The objective function is $\text{\u2211}_{j}{u}_{j}+\alpha \ast e(\{{p}_{i}\},\{{u}_{j}\},\Omega )$.*α*is a factor used to adjust the relative weight of the fit error*e*compared with the size of the reduced model (explained later). - Transform the constraints (if any) appropriately.
- The mixed-integer nonlinear program (MINLP) is$$\begin{array}{l}\underset{(\{{p}_{i}\},\{{u}_{j}\})}{\text{min obj}}=\sum _{j}{u}_{j}+\alpha \ast e(\{{p}_{i}\},\{{u}_{j}\},\Omega )\\ s/t:e(\{{p}_{i}\},\{{u}_{j}\},\phantom{\rule{0.4em}{0ex}}\Omega )\le {e}_{\text{th}}\\ {h}_{k}(\{{p}_{i}\},\{{u}_{j}\})=0,\phantom{\rule{0.4em}{0ex}}k=1,\dots ,{m}_{1}\\ {g}_{l}(\{{p}_{i}\},\{{u}_{j}\})\le 0,\phantom{\rule{0.4em}{0ex}}l=1,\dots ,{m}_{2}\\ {p}_{i,\text{LB}}\le {p}_{i}\le {p}_{i,\text{UB}},\phantom{\rule{0.4em}{0ex}}i=1,\dots ,{n}_{1};\phantom{\rule{0.4em}{0ex}}{u}_{j}=0/1,\\ j=1,\dots ,{n}_{2}\end{array}$$(1)where
*n*_{1}is the number of parameters to be optimised,*n*_{2}the total number of parameters that can be eliminated,*m*_{1}the number of equality constraints and*m*_{2}the number of inequality constraints. Different indices,*i*and*j*on*p*and*u*, respectively, are used to indicate that some of the fixed parameters can also be possibly eliminated and that some of the estimated parameters can be specifically retained if appropriate. The fit error,*e*, is a weighted sum of squared errors between model prediction and experimental data.*e*_{th}is the threshold on the fit error and is usually decided empirically on the basis of an acceptable fit error for the detailed model and error or noise estimate in experimental data based on repeated experiments. In the case of no repeats, for time-course data with many time-points, noise can be estimated using a denoising technique such as wavelets [24]. For non-time-course data, such as a dose-response curve, if no repeats are available, as is the case with the system modelled in this article, then the precision of the instrument can be used as a guideline to set the acceptable threshold.The value of*α*can be decided so that*α** e({*p*}, {u_{i}_{j}},**Ω**) is less than one. Then, the number of retained parameters is just the integral part of the objective function and provides an efficient way of differentiating between the networks with different sets of retained reactions, that is, different topologies. One choice of*a*is:*α*= 1/(*e*_{th}+ ε) where ε is a small positive number, ensuring that*a***e*({*p*}, {_{i}*u*},_{j}**Ω**) =*e*/(*e*_{th}+ ε) < 1 because of the inequality constraint*e*≤*e*_{th}. The use of ε can be avoided by using the strict inequality*e*<*e*_{th}. It might appear that the objective function is dominated by the size of the reduced network and hence the goodness of fit to data plays an insignificant role resulting in trivial models. However, this is not true because of the constraint:*e*({*p*}, {_{i}*u*},_{j}**Ω**) ≤*e*_{th}. If necessary and as dictated by available information, other constraints, formulated in terms of the binary variables, can be imposed on the network structure itself. Whereas the choice of the fit-error threshold,*e*, itself provides protection from oversimplification, the factor_{th}*α*also provides some protection from oversimplification, the larger its value, the lesser the degree of possible oversimplification. - Solve the resulting constrained MINLP using a GA [7, 22] or a deterministic search-based approach [25]. We have used a GA. It can be noted that if a deterministic search-based approach is used and all possible networks of the minimal size are sought, then the second term of the objective function can be removed (i.e.
*α*= 0).

Solving the MINLP will result in a CGM composed of the minimum number of reactions and associated parameter values needed to fit the experimental data and constraints within the allowable limit of the parameter values. The MINLP solution also contains an estimate of the parameter values of reactions not included in the CGM. However, the values of these eliminated parameters should not be used elsewhere because the associated binary variables were set to zero and, thus, these parameter values were not tested by the objective function for fit to the experimental data and biochemical constraints. This is a price to be paid for the simplification achieved. If necessary for any other purpose, these parameters can always be estimated with respect to the detailed model. The crucial aspect to be noted is that the MINLP approach provides a combination of the topology of the reduced network and the corresponding rate/flux parameters, which ‘together’ are consistent with the experimental data and constraints used. They should not be used in isolation without proper justification. Needless to say, accurate estimation of each and every parameter of the detailed model is not the intent of coarse-graining.

The use of constraints is critical for the development of good CGMs. If certain species/components are experimentally measured, then all the important reactions involved in the consumption or production of such species will be retained in order to maintain a good fit. It is important to note that in our approach it is the reactions that are eliminated and not the species. Even for reactions, one can require that certain mechanisms always be retained. This has been applied to one of the CGMs for the case study presented. If one requires that all reactions be retained, then no simplification would happen, otherwise, simplification would/should still be seen in other parts of the network. A species is only eliminated if all the reactions involving this species are eliminated, and this cannot happen if the species is measured and is used for data-fitting. However, it is true that if enough data are not used, then some important mechanisms may become eliminated unless they are retained by imposing a constraint explicitly. In short, one should not try to reduce models based on very less data and biochemical constraints. To some extent, lack of numeric data can be compensated by using qualitative information and biochemical knowledge.

In a general case, where all parameters are unknown and they can be potentially eliminated (i.e. *n*_{1} = *n*_{2}), the computational complexity of the MINLP method is a little more than twice that of the parameter-estimation problem if a stochastic search-based approach is used. This is because multiplication with binary variables is trivial and the total number of parameters (binary variables and the reaction rate parameters) to be estimated is about twice compared with the total number of the reaction rate parameters. Thus, a somewhat larger population size and number of generations are recommended. Hence, the overall factor is a little more than two. The complexity of simulation of the resulting network tends to scale with the number of retained reactions because usually the number of state variables is not reduced substantially. It can be noted that similar to any methodology for model development, the amount of experimental data used and its quality strongly affect the resulting CGM. Overall, the MINLP approach is incomparably faster than our previous MPVA approach because MINLP is fully automated and does not involve multiple iterations of manual parameter knockout.

## 3 Test case study: a CGM for GTPase-cycle module

The GTPase-cycle module controls signal transduction in heterotrimeric G protein signalling networks by regulating the activity of heterotrimeric G proteins. Therefore it serves as a key upstream control point for many cellular processes such as gene transcription and cell-cycle regulation [26]. In the module, G protein-coupled receptors (GPCRs) activate G proteins by accelerating the exchange of GDP for GTP, and GAPs deactivate G proteins by accelerating the hydrolysis of GTP to GDP. Several mechanisms may operate within the GTPase cycle based upon local concentrations, affinity and reaction kinetics of the active receptor, G protein and GAP. These mechanisms and the rich variety of resulting signalling phenomena were explored extensively in our detailed model of the GTPase cycle (Bornheimer *et al.* [23] and Fig. 1a). The network diagram of this module is common to both small and heterotrimeric G-proteins, but the specific data used to constrain the model comes from the GTPase cycle of m1 muscarinic receptor, Gq, and G protein signalling 4 (RGS4). In this case, Gq-GDP is activated by active m1 muscarinic acetylcholine receptor, which stimulates the exchange of GDP for GTP forming Gq-GTP, and deactivated by the GAP named the regulator of RGS4, which hydrolyses Gq-GTP to Gq-GDP. The module is complex because of the combinatorial reactions between the receptor, G protein and GAP, including 48 reactions. These are included to represent competing hypotheses about the mechanisms operating within this module. The detailed model is succinctly described in the Appendix.

Two of the mechanisms, collision coupling and what we term as the ternary complex mechanism, are of special interest. Collision coupling is the classical mechanism [27–31] in which inactive G-GDP binds the active receptor and is released as active G-GTP. This active G-GTP can signal downstream and/or the GAP can bind and hydrolyse G-GTP to G-GDP (Fig. 1a). This mechanism is included in the detailed model as the path GD → RGD → RG → RGT → GT (where GAP is free to act or not) → GD. Recent data, however, are consistent with a ternary complex mechanism (Bornheimer *et al.* [23] and Fig. 1b). This mechanism refers to a complex of the active receptor, G protein and GAP that persists throughout the GTPase cycle. It is represented in the detailed model by the path RGA → RGAT → RGAD. Various mechanisms to hold the ternary complex together have been suggested including direct binding of GAP to Receptor [32], kinetic scaffolding in which the reaction rate of GAP-catalysed GTP hydrolysis exceeds the receptor dissociation and the rate of receptor-catalysed GDP/GTP exchange exceeds GAP dissociation [33], and physical scaffolding by an unidentified additional protein [34, 35]. More recent studies and reviews also reveal the importance of computational models of the GTPase cycle and the ternary complex [31, 36, 37]. A kinetic scaffolding-based model has been recently used to study the ‘spatial focusing’ of active G protein [38]. What is unclear is whether and in which cases one or some combination of these operate and whether they differ in capacity to regulate G protein activity.

Previously, we used MPVA to reduce the detailed model and learned that the ternary complex was required to recapitulate key data in the GTPase cycle of m1 muscarinic receptor, Gq and RGS4 [7]. However, MPVA has two main drawbacks because the task of choosing which reactions to knock out based on the variability of the associated parameter values is iterative and manual. MPVA is slow and is not exhaustive in searching the whole model-structure space. MINLP circumvents both drawbacks. In so doing, good-quality CGMs could be found representing all our previous results. Indeed all CGMs retained the models arising from a collision coupling mechanism and the ternary complex mechanism. The unique predictions because of MPVA came in the mechanism of entry and exit from the ternary complex and the resulting signalling phenomena. The CGM that best fits available data features a novel, non-intuitive entrance to the ternary complex, and at physiological concentrations of the active receptor and the GAP, a unique regulation of G-protein activity is predicted, which is bimodal.

### 3.1 Review of the detailed model

The detailed model ([23] and Fig. 1a) has 24 reversible reactions, represented by 17 ordinary differential equations (ODEs) and 48 reaction rate parameters. The key model outputs are the fraction of active G protein (Z), and the GTP turnover rate (*v*) in the steady state [23]. The expressions for Z and *v* (at steady state) are

where [G]_{total} is the sum of the concentrations of all species involving the G protein. The ‘ + ’ or ‘ − ’sign as a subscript in the parameter names (e.g. in *P*_{−4}) indicate that the corresponding reaction is an association or dissociation reaction, respectively. This convention is same as used in [23] and is slightly different from that used in [7]. In our simulations, a steady state is defined and numerically approximated as the magnitude of the time derivatives for each of the state variables being <10^{−18} M/s, as detailed by Bornheimer *et al.* [23].

The detailed model fitted the experimental data on steady–state *v* and Z well and predicted that local concentrations of the active receptor and the GAP regulate steady-state Z and *v* within four sLSRs [7, 23]. These results for the detailed model are reproduced here (Figs. 1b and 1c) and briefly summarised. In LSRs, the active *R* and the GAP are either absent or present in saturating concentrations such that only one of the four so-called extreme pathways is active (Fig. 1c). Increase in [R_{active}] leads to an increase in both the fractional activation of the G protein (Z) and the turnover rate of GTP hydrolysis (*v*), an indicator of the turnover rate of signalling dynamics, whereas an increase in [A] results in an increased *v* but it decreases Z. In terms of Z and *v*, the four LSRs are characterised by a low Z and the lowest *v* in the LSR G, the lowest Z and a low *v* in the LSR GA, the highest Z and a high *v* in the LSR RG, and a high Z and the highest *v* in the LSR RGA (Fig. 1c). Thus, the LSR RGA is specialised for both a large peak response and rapid signalling. Essentially, by controlling the local concentration of the active receptor and the GAP, cells can exhibit a variety of responses spanning several orders of magnitude in Z and *v*.

To coarse-grain the GTPase cycle network, the data used for optimisation through MINLP consisted of the same experimental data and constraints used in the detailed model [23], especially data on the GTP turnover rate (*v*) under different conditions (Fig. 1b) and two experimental data points on Z [23] (not shown). Also included were three simulated data points on Z and *v* from the region between the LSR G and the LSR GA predicted by the detailed model (panel 1 in Fig. 1c). The philosophy behind using the three simulated data points on Z is that in conjunction with the sigmoid shape of the predicted Z against [A] curve, they are sufficient to capture the curve quantitatively. In addition, to ensure mass balance, it was required that the count of the input and the output fluxes at any node involving a G protein were either both zero or greater than zero, a necessary condition to ensure the connectivity of each of the four extreme pathways (Fig. 1a) in the reduced network. A further constraint was imposed to retain at least one of the reactions for the binding of the G protein to GAP (*A*_{+1}, *A*_{+2} or *A*_{+3}). To evaluate the agreement between simulations and experimental data, we used a fit-error function, *α* * *e*({*p _{i}*}, {

*u*},

_{j}**Ω**), a weighted sum of squared errors, computed as described in our previous work [7] and summarised in the Appendix.

### 3.2 A 17 parameter CGM obtained using LSR data on Z and v that predicts a biphasic dose-response

#### 3.2.1 Results of coarse-graining

In five trials of a GA-based solution of the MINLP, a CGM with 17 parameters, called CGM-bi, was generated (Fig. 2a). The fit to experimental data (Fig. 2b) is nearly as good as that for the detailed model and all the LSRs in the Z and *v* plots are captured well (the two panels in Fig. 2c).

A key feature of MINLP is the simultaneous search of the model-structure and model-parameter spaces such that previously unconsidered solutions may result (so long as they fit all experimental data) and/or competing hypotheses may be distinguished. For the GTPase cycle, the best solution of the MINLP, CGM-bi, contains a surprising mechanism governing the ternary complex of the active receptor, G protein and GAP. Experimental data supports the existence of this complex [32, 34, 35] and we previously showed that the ternary complex was essential [7]; the mechanism for the formation of the ternary complex in our previous CGM [7] differed from the corresponding mechanism in CGM-bi. According to the CGM-bi, formation of the ternary complex occurs when the active receptor associates with the G-GAP-GDP (GAD) complex and is driven by the receptor concentration. This contrasts with a model in which the GAP joins a receptor-G-protein complex immediately after activation ([33, 39]; simulated below) and is consistent with the idea of association of the RGS with the GPCR [32] (reviewed in: [32, 37, 40–42]) and association of a complex containing RGS and inactive G protein [34, 42].

#### 3.2.2 Biological relevance

The CGM-bi can explain an unexplained old phenomenon in pharmacology, that is, the biphasic (bell-shaped) signalling response to certain GPCR agonists. At cellular GAP concentrations (about 50–1000 nM), the dose-response curve of Z (which reflects G-protein activity) is biphasic with increasing levels of the active receptor. As [*R*_{active}] increases, Z first increases, reaches a maximum level and then decreases before saturating (Fig. 2c, left panel, the biphasic curve indicated with the black arrow). This phenomenon has never before been predicted to be because of RGS proteins.

To illustrate how this predicted role of RGS can explain empirical data on the biphasic dose response, we turn to the M2 muscarinic acetylcholine receptor (m2AChR), which is similar to m1AChR used in our model, except that it signals not through Gq but through Gi to inhibit the production of cyclic adenosine monophosphate (cAMP). Stimulation of m2AChR-expressing Chinese hamster ovary (CHO) cells with carbachol leads first to inhibition of cAMP through the activity of Gi, but at higher carbachol concentrations to the recovery of cAMP [43], resulting in a biphasic dose-response curve. Reducing the density of m2AChR by binding low concentrations of the antagonist oxyphenium results in the loss of cAMP recovery and, hence, the dose-response curve is sigmoidal. These surprising results gave rise to the hypothesis that a high density of active m2AChR can activate Gs (Gs opposes Gi by stimulating cAMP production), but the evidence to support this has so far been indirect [43, 44]. Our model suggests an alternative hypothesis. In simulations, as the density of active receptor is increased in the presence of 50–1000 nM RGS, Gi activity first rises and then falls (Fig. 2c). Thus, in the m2AChR data, cAMP would recover simply because Gi activity falls because of the RGS. When the receptor density is reduced by oxyphenium treatment, the dose-response curve would become sigmoidal because at low receptor concentrations the biphasic regime is not encountered. Thus, the RGS in the context of the CGM-bi can explain biphasic dose-response data. One way to test this hypothesis in the m2AChR-expressing CHO cells is to knock down endogenous Gi using small-interfering RNA and then transfect it with a mutant Gi that does not interact with RGS proteins (Gly to Ser mutation [45]). After stimulating with carbachol, biphasicity in the cAMP response should be eliminated.

#### 3.2.3 Simulation-based explanation of biphasic response

To demonstrate in detail how the CGM-bi produces a biphasic response, we performed several additional simulations. The CGM-bi and the detailed model were simulated for total [G] = 10 nM and [RGS] = 100 nM. The total effective active-receptor concentration, [*R*_{active}], was varied from 10^{−15} M to 10 μM as was carried out for generating the plot shown in Fig. 2c and the steady-state concentration of various species at *t* = 250 s was recorded. The resulting data for the CGM-bi and the detailed model are shown in Figs. 3a and 3b, respectively. The biphasic behaviour of Z in the CGM-bi is reflected in a biphasic profile of the steady-state [G *T] against [*R*_{active}] (Fig. 3a). Biphasicity arises from the nature of formation of the ternary complex in the CGM-bi. The only route for its formation is the reaction GAD → RGAD (the reverse reaction is absent). Only when [*R*_{active}] is large, does this reaction proceed rapidly (which it must to compete against rapid dissociation of the GAD) resulting in the ternary complex and the reaction RGAD → RGA → RGAT → RGAD, in which G is well regulated, pressuring the total amount of active G downwards. Lower [*R*_{active}] assures that the ternary complex is not formed and, hence, that regulation of G-protein activity by the RGS is reduced and the fraction of active G increases. Quantitatively, as [*R*_{active}] increases, the quantity of active G * T (Fig. 3a, the curve with circle marks), which is the dominant form of the G protein at the peak ([RGS] = 100 nM, [*R*_{active}] ~ 3–10 nM), is reduced by about a factor of 6–8 at very large [*R*_{active}] compared with its value at the peak. Correspondingly, the amount of active G protein in the ternary complex (RG * AT) increases to about the same amount as free G * T; other states of active G are much lower). Together, these simulations suggest a mechanism whereby at moderate concentrations of the active receptor and GAP, most G proteins are activated (Z ~ 1), but as the receptor concentration is increased, the G proteins, GAP and receptor are drawn into a ternary complex that functionally increases GAP activity, decreasing G-protein activity (Z ~ 0.21).

**Variation in the steady-state concentration of different complexes of G protein, receptor and GAP at 100 nM total [GAP] and increasing total concentration of active receptor (R)**

In contrast, the species-concentration profile for the detailed model is substantially different and does not predict biphasic behaviour. At medium and large [*R*_{active}], the dominant active form of G protein is [RG * T] as opposed to [G *T]. [RG * T] has a sigmoid shape for the detailed model as opposed to a biphasic shape for the CGM-bi. RG * T can also bind GAP to enter the ternary complex, and a substantial fraction of active G is RGA * T, which also has a sigmoidal shape. The shape of G * T is still biphasic, but the fraction of active G in this form is small. The contribution of the ternary complex at moderate active receptor concentrations in the detailed model because of the reaction RG * T → RG * AT restrains maximum G-protein activity to about half that of the CGM-bi (Z ~0.5 compared to ~1).

In short, biphasicity in the CGM-bi arises mainly because of the mechanism of formation of the ternary complex. Because of the absence of the reaction RG * T → RG * AT, the formation of ternary complex takes place only through the route GAD → RGAD as opposed to multiple routes for the formation of the ternary complex in the detailed model. The ternary complex idea was first proposed in 1996 [33] but has not been suggested to regulate signalling biphasically as predicted by the CGM-bi in this study. We recognise that the mechanism proposed by the CGM-bi is unconventional, but note that it agrees with extensive GTPase cycle data from reconstituted vesicle systems used in this study [33, 46] and is consistent with concepts of an RGS-receptor coupling independent of G-protein and an RGS-G-protein coupling independent of G-protein activity, and with cell-based and in vitro data in other heterotrimeric GTPase cycle systems [32, 34, 35, 47]. In the near future, we and our collaborators plan to test the prediction that the RGS can affect the biphasic dose response by using wild-type and RGS-insensitive G proteins in mammalian cell models of muscarinic acetylcholine signalling.

### 3.3 Ability to add known biochemical information as constraints: another 17 parameter CGM (CGM-sig)

Although the CGM-bi was produced automatically by MINLP as the best fit to extensive available data, its mechanism may be controversial. In this section, we have forced MINLP to retain an alternative entry to the ternary complex, which was proposed as early as 1996 [33]. In this mechanism, the GAP is drawn into the complex immediately after G-protein activation RG * T → RG * AT. The ternary complex is maintained because the kinetic events along the GTPase cycle (reactions *T*_{+4}, *P _{−4}*,

*D*

_{+4}) are faster than competing protein dissociation events [23, 33, 39]. MINLP was, thus, programmed to retain both A

_{+5}and A

_{−6}by setting the corresponding binary variables to 1. With these changes, MINLP produced a new CGM with 17 reactions (CGM-sig) (Fig. 4a). Reactions A

_{+5}and A

_{−6}are retained as required, but compared with CGM-bi reactions, R

_{−5}(RG * AT → G * AT) and R

_{+6}(GAD → RGAD) are lost. Thus, RG * T → RG * AT is the only route for the formation of the ternary complex. The fit to experimental data for the CGM-sig is shown in Fig. 4b. Although the fit in panel 3 (

*v*against [GAP]) in Fig. 4b is not as good as that for the CGM-bi (panel 3 of Fig. 2b), it is still quite good; there is little difference in the EC

_{50}value of [GAP]. Similar arguments can be made about the fit in panel 2. The LSRs are also predicted accurately for both the Z and

*v*plots (Fig. 4c), that is, they are similar to those for the detailed model although there are noticeable differences in the approach to the LSR RGA among the three models.

The slight deterioration in the fit observed in panels 2 and 3 of Fig. 4b for CGM-sig compared with the fit in Fig. 2b for the CGM-bi is not totally surprising since with the addition of more constraints, the optimal value of the objective function for a minimisation problem can only increase or remain unchanged [48]. Since the number of reactions/ parameters retained is the same for the CGM-bi and the CGM-sig, the fit error for the CGM-sig is slightly worse than that for the CGM-bi. A careful comparison of the LSRs in Figs. 2c and and4c4c reveals another difference between these models: at very high [*R*_{active}], the IC_{50} (for Z) or EC_{50} (for *v*) of [GAP] for the CGM-sig is higher than that for the CGM-bi.

The automatically generated CGM-sig is similar but superior to a manually constructed CGM of 15-parameters that we developed previously (Fig. 8 of [7]). The CGM-sig has two additional reactions (*P*_{+2} and *A*_{+3}). The fit to experimental data for the CGM-sig and the 15-parameter CGM [7] are similar. However, the quantitative values of Z and *v* in the LSRs G and GA are better captured (closer to the detailed model) by the CGM-sig compared with the 15-parameter CGM in our previous study [7]. This suggests that the main role of the reactions *P*_{+2} and *A*_{+3} is to ensure quantitative correctness in the very low [*R*_{active}] regimes, that is, LSRs G and GA. This conclusion is supported by the CGM-bi that also contains the reactions *P*_{+2} and *A*_{+3} and predicts values of Z and *v* in the LSRs G and GA similar to those by the CGM-sig. In short, MINLP automatically determined an improved model with only two additional parameters.

### 3.4 Comparison of the values of the parameters in the CGMs with their values in the detailed model

Table 1 lists the values of the parameters that were retained in the CGM-bi or the CGM-sig and were optimised or computed using equilibrium constraints [7]. These values correspond to the fit and predictions shown in Figs. 2 and and4.4. Their values in the detailed model are also listed for comparison. Besides these parameters, each CGM included five parameters that were known and fixed (not optimised or computed). These are: *D*_{−1} = 1.0 × 10^{−4}, *D*_{−3} = 2.0, *P*_{−1} = 0.013, *P*_{−2} = *P _{−}*

_{4}= 25. The remaining parameters are broadly classified in four categories. Let

*f*= (

*x − x*

_{full_model})/

*x*

_{full_model}be the deviation factor of a parameter value for a CGM compared with its value in the detailed model. Then, the four categories are:

- −0.5 ≤
*f*≤ 2.0 (Table 1; superscript a), - 2 <
*f*≤ 10 (superscript b), *f*> 10 (superscript c) and*f*< −0.50 (superscript d).

The parameters that deviate from the detailed model the least across different CGMs are *R*_{+3} (least), *D*_{−4}, *T*_{+4}, *T*_{+3} and *A*_{+}* _{2}* (in that order). For all CGMs, the value of

*R*

_{+3}is very close to 10

^{8}, the upper bound used during optimisation, suggesting that the bounds chosen could alter the results. Below, we discuss the results of comparing the range of parameter values for the two CGMs.

For a chosen CGM, a parameter-value set is considered good if the fit to experimental data is good and its prediction of the LSRs is similar to the predictions shown in Fig. 2 (for CGM-bi) or Fig. 4 (for CGM-sig). The results of such a comparison of parameter-value ranges over good sets are displayed in Fig. 5. Most parameters show a similar range across all models. As observed in Table 1, these include *T*_{+3}, *A*_{+2}, *R*_{+3}, *T*_{+4} and *D*_{−4}, which are retained in all models. The parameters *A*_{+5} and *R*_{+6} have similar values in the models in which they are retained. The parameters *P*_{+2}, *A*_{+3}, *A*_{−3}, *A*_{−6}, *R*_{−5} and *R*_{−2} vary by up to two orders of magnitude across different models. For these parameters (except *R*_{−2}), the within-model variation (each vertical bar) is actually the largest for the detailed model, suggesting that for the CGMs, most parameters are well constrained. The parameter *T*_{+1} has a very wide range in the CGMs, up to five orders of magnitude for the CGM-bi, whereas it is less than one order of magnitude for the detailed model. Interestingly, at least part of the range in the detailed model overlaps with the range for the CGMs, suggesting that the value of *T*_{+1} can be fixed based upon the detailed model, and the coarse-graining methodology should be able to find suitable values of other parameters for the reaction networks for the CGM-bi and the CGM-sig. To verify this, for each CGM reaction network topology, for each good set, a pseudo-quantitative Euclidian distance of the retained parameter values from the corresponding MIN or MAX in the detailed model was computed. To compute the distance, if the value of a chosen parameter was within the MIN and MAX, then the difference was considered to be zero, otherwise, the log10 ratio of MIN/value (when value < MIN) or value/MAX (when value > MAX) was taken as the difference. Then, the distance was computed as the square root of the sum of the squared differences and the good set with the least distance was identified. For the CGM-bi and the CGM-sig, the value of *T*_{+1} for the closest good set is marked with a ‘ * ’ sign in Fig. 5. For other parameters, it is not marked since the range is less than one order of magnitude. For *T*_{+1}, these values shown in Fig. 5 (7.31 × 10^{4} and 1.05 × 10^{6} for the CGM-bi and the CGM-sig, respectively) are quite different from the values reported in Table 1. This value for the CGM-sig is close to the MAX value in the detailed model, whereas for the CGM-bi, it is about MIN/4. Hence, these values would have to be fixed. It may be possible that a value of *T*_{+1} for the CGM-bi within MIN and MAX of the detailed model does not exist, however, it cannot be concluded from the pool of simulation data generated. Overall, except for one parameter, we are able to find good agreement between the parameter values of the CGMs and the detailed model.

## 4 Conclusions

Model simplification is an important problem in systems biology. We developed an MINLP-based method for model reduction and tested it on a detailed model of the GTPase-cycle module. The method is particularly suitable for model reduction of biological systems that are nonlinear and for which little is known about parameter values. The computational complexity of the MINLP methodology is a little more than twice that of parameter estimation only by a stochastic search, and is much faster than our previous MPVA approach since no iterations are involved. In our case study of the GTPase cycle, in terms of good fit to data, prediction of the four LSRs and retaining the active ternary complex RG * AT, the CGMs obtained by MINLP are comparable to or better than a CGM developed earlier by using a MPVA approach [7]. Additionally, by automatically searching the model-structure space, MINLP found an alternative mechanism of the GTPase cycle that predicts that, in the presence of the RGS, increasing the density of the active receptor (analogous to increasing the ligand) can result in a biphasic (bell-shaped) response in G-protein activity, a result that has been observed in pharmacology but never explained by the RGS. At this stage, new experimental investigations, for example using mutant Gi that do not interact with the RGS [45], are required to validate these mechanisms. This highlights the greatest strength of MINLP: by exhaustively and simultaneously searching the model-structure and model–parameter space against the available data, MINLP can identify novel CGM mechanisms. In general, MINLP is expected to substantially reduce and, thus, make tractable models of highly connected biochemical networks.

## Acknowledgments

This work was supported by National Institutes of Health (NIH) collaborative grants U54 GM62114 (Alliance for Cellular Signalling, SS) and U54 GM69338-04 (LIPID Metabolites And Pathways Strategy, SS), NIH grant R01-GM068959 (SS), NIH grant HL087375 (SS), NSF grant DBI-0641037 (SS) and a grant from the Hilblom foundation (SS). We thank Prof. Elliot M. Ross (The University of Texas Southwestern Medical Center) for the experimental data used in this work.

M.R. Maurya designed the algorithm and developed the computer program (prototype) for the MINLP method, implemented the MINLP framework for the GTPase cycle module case study and wrote the manuscript. S.J. Bornheimer assisted in designing the simulations, interpreting the results and writing the manuscript. V. Venkatasubramanian and S. Subramaniam supervised the overall research and the preparation of the manuscript.

## 7 Appendix

#### 7.1 Model for GTPase cycle module ([23] and Fig. 1a)

The detailed model consists of 17 ODEs and 48 parameters. There are 24 reversible reactions, named as *R*1, *R*2 and so on, in the network. For the reversible reaction *A*1, the association and dissociation rate constants are denoted by *A*_{+1} and *A*_{−1}, respectively, and the equilibrium constant (i.e. ratio of the rate constant of the forward reaction to that of the reverse reaction) is denoted by 1/*A*1. Other reactions also follow this convention. Note that some forward reactions are association reactions (such as *A*_{+}_{i}, R_{+}_{i}, T_{+}* _{i}* where

*i*= 1, 2, 3 and so on), whereas some other forward reactions are dissociation reactions (such as

*P*). The unit of the rate constant for association reactions is

_{−i}, D_{−i}*M*

^{−1}

*s*

^{−1}and that for dissociation reactions is

*s*

^{−1}. The ODEs are derived by applying mass balance to various species in the network (the 12 shown nodes in Fig. 1a and

*R, A, T, D*and

*P*). For example

Similar differential equations are written for the concentration of other components. There are 15 distinct faces in the three cubes – each of which gives a thermodynamic constraint (corresponds to the thermodynamic cycle with four reversible reactions), for example the constraint for the face involving *G*, *GA*, *G* * *AT* and *G* * *T* is *A*1 * *T*2 = *T*1 * *A*2 (involves eight rate constants). Four additional thermodynamic constraints are generated from the thermodynamic cycles corresponding to the four horizontal paths (cycles with three reversible reactions) based upon free energy change (ΔG°) for GTP hydrolysis (for example, *P*1 * *D*1 * *T*1 = 1/*K*_{eq} = 1/ (1.457 × 10^{9}) where ΔG° = — RT.ln(K_{eq}) = kcal/mol (*T* = 25°C = 298.15 K), is one of the four constraints). Out of 19 constraints, only 13 are linearly independent, which are used to calculate 13 parameters using the values of the other 35 parameters. Seven parameters are fixed. The calculated parameters are required to be within their bounds. Apart from these, four additional constraints (*A*2/ *A*3 > 100; *A*5/*A*6 > 100; 5.0 × 10^{−8} < 1/*A*2 and 1/ *A*5 < 1.5 × 10^{−7}) that are based on experimental data are used to restrict the search space. Thus, 28 parameters are estimated using the GA-based optimiser to fit the experimental data (explained later in this section). The expressions for the 13 parameters are given below (for simplicity, the name of the parameters ‘*X*_{−}* _{i}*’ and ‘

*X*

_{+}

*’ are written as ‘*

_{i}*X*−

*i’*and ‘

*X*+

*i*’).

#### 7.2 Objective function [7]

In a chosen data set (panel 1, 2 or 3 in Fig. 1b), all data points, except the middle point, were given uniform weight. The weight assigned to the middle point was five times higher. Next, to normalise for variation in the values across different data sets, the first data set (i.e. the experimental data in panel 1 of Fig. 1b) was chosen as a reference and the error (experimental value − predicted value) in other data sets were scaled by the ratio of the maximum value in panel 1 to the maximum value in the respective data sets. Data sets 4 and 5 consist of single data points (at half-maximal inhibitory concentration, IC_{50}, of the GAP; not shown in Fig. 1b). Their weight is unity. An expression for the objective function can be written as follows. Let *v _{i, j}* and

*Z*denote the predicted value of

_{i, j}*v*and

*Z*, respectively, corresponding to the

*j*th data point of the

*i*th data set. Let

*and*

_{i, j}*denote the corresponding experimental value of*

_{i, j}*v*or

*Z*. Let

*w*denote the weight assigned to them. Further, let

_{i, j}*v*

_{i}_{, max}and

*Z*

_{i}_{, max}denote the maximum values of

*v*and

*Z*, respectively, in data set

*i*. Then, the objective function is

where *ω*_{1,5} = *ω*_{2,4} = *ω*_{3,5} = 5 (all others are 1).

For the three additional data points from Fig. 1c (panels 1 and 2), [*R*_{active}] = 10^{−15} M and [A] = [10^{−12}, 10^{−8}, 10^{−5}] M. The corresponding (pseudo-experimental) *Z* = [7.35 × 10^{−3}, 5.44 × 10^{−3}, 8.96 × 10^{−5}] and *v* = [9.55 × 10^{−5}, 1.02 × 10^{−5}, 1.96 ×10^{−3}] s^{−1}. The weight vector [1 5/3 1] is used. The expression for the fit error on *Z* is

It can be noted that the normalisation scheme in (4) is slightly different from that used in (3). This choice is somewhat arbitrary and a scheme exactly similar to that used in (3) could have been used. A similar expression is used to compute the fit error on the *v*-LSR data. The overall objective function is (obj* _{v,Z}* + obj

_{Z,LSR}+ obj

_{v,LSR}).

#### 7.3 GA for optimisation and parameter estimation

The GA code used in this work is derived from the public-domain GA code, GAlib, available from MIT (http://lancet.mit.edu/ga/) and has been modified. Exact details are presented in our previous work [49]. The basic strategy of GA is explained below.

GA is based upon Darwin's theory of natural selection and survival of the fittest [22, 50] and has been successfully used in many optimisation problems. In an optimisation problem, an objective function is to be minimised by manipulating the values of the free parameters within certain bounds (search space). GA is a population-based search technique. The population consists of members. Each member corresponds to a particular set of parameter values. Thus, an objective value is associated with each member. For a minimisation problem, a member with a lower objective value is considered fitter compared with a member with a higher objective value. In the classical GA [22], the parameters in each member are represented through a binary string (called genome). For real-parameter optimisation problems, a floating-point representation has been recently proposed by Wolf and Moros [51] and later used by Katare *et al.* [52]. The number of bits reserved for each parameter depends on the allowed range [upper bound (UB) − lower bound (LB)] of the parameter and the desired accuracy in its solution. For example, if LB = 10, UB = 50 and the required precision is 0.1, then the required number of bits >= ceil(log 2((50 − 10)/0.1 + 1)) = 9 bits where ceil is the ceiling function. The search starts by randomly choosing a population of initial guesses. Depending upon the range (small or large), the values of the parameters in the initial population can be chosen either uniformly (for small ranges) or on a logarithmic scale (for large ranges). Then, the objective function is evaluated for each member and rank-ordered according to a decreasing objective. Thus, the fittest member is the last in the sorted population. Each member is assigned a fitness value based upon its objective and the objective for other members. The best member (which fits the data best) has the highest fitness. Next, crossover and mutation operators are used to generate offspring from parents. The more fit the members become, the more chance of undergoing crossover (mating) and, hence, they are more likely to pass their genes to the next generation. Either both crossover and mutation can be applied or only crossover or mutation can be applied. In crossover, there is a high possibility of exchanging large parts of the genomes. In mutation, only a few bits become mutated randomly. Crossover brings large changes, whereas mutation results in small changes in most cases although the actual change depends upon the location of the bit in the string. A crossover probability of 90% is quite common. Thus, if a random number between 0 and 1 is <0.9, then the chosen parents mate, otherwise, offspring are generated by mutating the parents. The mutation probability is kept low (e.g. 1–2%). There are several rules to choose the parents for crossover and mutation [53]. One parent can be chosen on the basis of rank in the population and the other can be chosen on the basis of fitness [51, 52]. In some applications, some of the best members of the current population are directly transferred to the next generation as decided by an elitism fraction [52, 53]. Usually, the population size [about 5–10 times compared with the number of unknowns (parameters to be estimated)] is kept fixed for each generation. Thus, an appropriate number of new members are generated by applying the mutation and crossover operators to the parents. The process of going from the current generation to the next generation is called evolution. GA works on the premise that the members in the next generation are on an average fitter than in the current generation. In other words, it is likely that some of the members of the next generation fit the data better compared with members in the current generation. Usually, the objective value for the best member in the next generation is not always better than that of the best member of the current generation. Thus, the best member improves in discrete steps. Elitism ensures that the best member of the next generation is at least as good as the best member of the current generation [52]. The process of evolution continues for a fixed number of generations, or several generations until a solution with the desired objective value is found, or most of the members of the population become similar (based upon a genome similarity measure). Sometimes, a GA-based search can become trapped in local minima; to avoid this, the mutation rate can be increased. In fact, depending upon the diversity of the population, one may schedule the mutation rate to ensure enough diversity. It is also advisable to carry out several runs of the GA starting with different initial guesses. At the end of the GA, one can either accept the member with the lowest objective value as the solution or analyse/post-process several members with the objective value close to the best member. For parameter-estimation problems, the latter alternative is beneficial [52] since the user can then judge the collection of models having low objective value; this is useful because the lowest objective value may not correspond to the best model because of the difficulty of designing a flawless objective function and of the noise or errors in the data.

## References

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (998K)

- Reduced-order modelling of biochemical networks: application to the GTPase-cycle signalling module.[Syst Biol (Stevenage). 2005]
*Maurya MR, Bornheimer SJ, Venkatasubramanian V, Subramaniam S.**Syst Biol (Stevenage). 2005 Dec; 152(4):229-42.* - Extraction of elementary rate constants from global network analysis of E. coli central metabolism.[BMC Syst Biol. 2008]
*Zhao J, Ridgway D, Broderick G, Kovalenko A, Ellison M.**BMC Syst Biol. 2008 May 7; 2:41. Epub 2008 May 7.* - Using chemistry and microfluidics to understand the spatial dynamics of complex biological networks.[Acc Chem Res. 2008]
*Kastrup CJ, Runyon MK, Lucchetta EM, Price JM, Ismagilov RF.**Acc Chem Res. 2008 Apr; 41(4):549-58. Epub 2008 Jan 25.* - Mathematical and computational techniques to deduce complex biochemical reaction mechanisms.[Prog Biophys Mol Biol. 2004]
*Crampin EJ, Schnell S, McSharry PE.**Prog Biophys Mol Biol. 2004 Sep; 86(1):77-112.* - Dynamic modelling and analysis of biochemical networks: mechanism-based models and model-based experiments.[Brief Bioinform. 2006]
*van Riel NA.**Brief Bioinform. 2006 Dec; 7(4):364-74. Epub 2006 Nov 14.*

- Simultaneous model discrimination and parameter estimation in dynamic models of cellular systems[BMC Systems Biology. ]
*Rodriguez-Fernandez M, Rehberg M, Kremling A, Banga JR.**BMC Systems Biology. 776*

- PubMedPubMedPubMed citations for these articles

- Mixed-integer nonlinear optimisation approach to coarse-graining biochemical net...Mixed-integer nonlinear optimisation approach to coarse-graining biochemical networksNIHPA Author Manuscripts. Jan 2009; 3(1)24PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...