- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- Bioinformatics
- PMC2660871

# Simulation of large-scale rule-based models

^{1}Michael I. Monine,

^{2}James R. Faeder,

^{3}William S. Hlavacek,

^{2,}

^{4}Daniel D. Von Hoff,

^{5}and Richard G. Posner

^{1,}

^{6,}

^{*}

^{1}Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004,

^{2}Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545,

^{3}Department of Computational Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260,

^{4}Department of Biology, University of New Mexico, Albuquerque, NM 87131,

^{5}Clinical Translational Research Division, Translational Genomics Research Institute, Phoenix, AZ 85004 and

^{6}Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ 86011, USA

## Abstract

**Motivation:** Interactions of molecules, such as signaling proteins, with multiple binding sites and/or multiple sites of post-translational covalent modification can be modeled using reaction rules. Rules comprehensively, but implicitly, define the individual chemical species and reactions that molecular interactions can potentially generate. Although rules can be automatically processed to define a biochemical reaction network, the network implied by a set of rules is often too large to generate completely or to simulate using conventional procedures. To address this problem, we present DYNSTOC, a general-purpose tool for simulating rule-based models.

**Results:** DYNSTOC implements a null-event algorithm for simulating chemical reactions in a homogenous reaction compartment. The simulation method does not require that a reaction network be specified explicitly in advance, but rather takes advantage of the availability of the reaction rules in a rule-based specification of a network to determine if a randomly selected set of molecular components participates in a reaction during a time step. DYNSTOC reads reaction rules written in the BioNetGen language which is useful for modeling protein–protein interactions involved in signal transduction. The method of DYNSTOC is closely related to that of StochSim. DYNSTOC differs from StochSim by allowing for model specification in terms of BNGL, which extends the range of protein complexes that can be considered in a model. DYNSTOC enables the simulation of rule-based models that cannot be simulated by conventional methods. We demonstrate the ability of DYNSTOC to simulate models accounting for multisite phosphorylation and multivalent binding processes that are characterized by large numbers of reactions.

**Availability:** DYNSTOC is free for non-commercial use. The C source code, supporting documentation and example input files are available at http://public.tgen.org/dynstoc/.

**Contact:** gro.negt@cotsnyd

**Supplementary information:** Supplementary data are available at *Bioinformatics* online.

## 1 INTRODUCTION

Models for biochemical reaction networks are commonly specified in terms of (i) a list of reactions; (ii) a visual layout of the reactions in a system or a reaction scheme diagram, which may be enhanced using annotative iconography (Kitano *et al.*, 2005); or (iii) a set of ordinary differential equations (ODEs), in which one equation is included for each possible chemical species in a network. Such models can also be specified in terms of reaction rules (Hlavacek *et al.*, 2006), which provide a high-level representation of the molecular interactions in a system that give rise to individual reactions and chemical species. Rules are particularly useful when one wishes to account for molecular substructure and/or site-specific details of molecular interactions in a model, as illustrated by the didactic example of Danos (2007). The BioNetGen language (BNGL) (Blinov *et al.*, 2006; Faeder *et al.*, 2005b, in press) and the closely related κ-calculus (Danos and Laneve, 2004; Danos *et al.*, 2007a) are examples of formal languages that have been developed for precisely specifying models for biochemical systems in terms of reaction rules. Other related modeling frameworks include dynamical grammars (Mjolsness and Yosiphon, 2006), ρ_{bio}-calculus (Andrei and Kirchner, 2008), and BlenX (Dematté *et al.*, 2008). BNGL can be processed by the BioNetGen software package to generate a reaction network automatically from a set of BNGL-encoded rules (Blinov *et al.*, 2004; Faeder *et al.*, 2005a; http://bionetgen.org/). The rule-derived network can then be simulated using conventional methods that take a reaction network as input. Rules have been used most commonly to model signal-transduction systems (Hlavacek *et al.*, 2006), but rules have been used to model other types of biochemical systems as well (for example, see Mu *et al.*, 2007).

A set of rules often implies a vast reaction network (Danos *et al.*, 2007a; Hlavacek *et al.*, 2003, 2006). Deriving a network from such a set of rules is computationally expensive, in large part because of the need to routinely solve graph isomorphism problems (Blinov *et al.*, 2006). Moreover, simulating a model for such a network (if it can be fully derived) may be impractical with conventional methods. For example, the computational cost of numerically integrating ODEs, which is often the most efficient approach for simulating a small reaction network, depends non-linearly on *N*, the number of ODEs (i.e. the number of chemical species in a network). For stiff ODEs, the cost of numerical integration typically scales with *N*^{3}, and simulating a system for which *N* is larger than 10^{4}−10^{5} can be prohibitively expensive. The computational cost of stochastic simulation methods, such as Gillespie's method and most subsequent improvements of this method (Gillespie, 2007; Li *et al.*, 2008), depends on *M*, the number of reactions in a system. Schulze (2007) and Slepoy *et al.* (2008) have recently described stochastic simulation methods with costs independent of *M*, but to apply either of these methods to simulate a rule-based model, one must first generate a network from rules, which again is costly.

The expense of network generation can be avoided or reduced in several ways. It can be reduced by terminating network generation before the full list of potential reactions is obtained and then simulating the resulting partial network, but the accuracy of results is uncertain. Network generation is minimized in a principled way in the on-the-fly method of stochastic simulation (Faeder *et al.*, 2005a; Gillespie, 2007; Lok and Brent, 2005), in which only reactions that connect populated and reachable chemical species are generated. However, on-the-fly simulation of a highly branched reaction network is significantly slowed by the costs of network generation (Hlavacek *et al.*, 2006; Yang *et al.*, 2008). Simulation is sometimes made tractable by methods for reducing the size of rule-based models (Borisov *et al.*, 2005; Conzelmann *et al.*, 2006). However, model reduction methods are not generally applicable. New approaches and software tools are needed for simulation of large-scale rule-based models, i.e. models of systems described by rules that imply large numbers of possible chemical species and reactions.

To address this need, two closely related particle- or agent-based methods have recently been developed for simulating rule-based models (Danos *et al.*, 2007b; Yang *et al.*, 2008). In these methods, a time increment is sampled from an exponential distribution, a rule is selected from among a weighted list of rules, just as reactions are sampled in Gillespie's method, and the selected rule is used to generate a reaction event (i.e. to select reactants to participate in a type of reaction defined by the selected rule). Reaction network generation is avoided. Software implementing the method of Danos *et al.* (2007b), called Kappa Factory, is available from Plectix BioSystems, Inc. (W.Fontana, personal communication). This software is capable of processing model specifications defined using κ. Similar general-purpose software implementing the method of Yang *et al.* (2008), which extends the method of Danos *et al.* (2007b), will be available soon for simulating BNGL-encoded models.

Here, to further address the need for tools that can simulate large-scale rule-based models, we present DYNSTOC, an open-source software that implements an extension of the StochSim simulation method (Le Novère and Shimizu, 2001; Morton-Firth and Bray, 1998; Shimizu and Bray, 2001). The method implemented in StochSim is an agent-based null-event simulation procedure. In the extension of this procedure, BNGL-encoded reaction rules play an integral role, and there is no requirement for an a priori specification of the individual reactions implied by the rules. The main difference between DYNSTOC and StochSim is in the ability of DYNSTOC to use a set of rules specified in an expressive model-specification language to assess whether randomly selected molecular components are able to participate in a reaction. This difference greatly expands the range of rule-based models that can be simulated using the StochSim/DYNSTOC approach.

## 2 METHOD

The conventions of the model-specification language BNGL, the graphical foundations of BNGL, the graph-rewriting approach used to interpret BNGL-encoded reaction rules and the StochSim simulation approach have been described in detail elsewhere (Blinov *et al.*, 2006; Faeder *et al.*, 2005b, in press; Hlavacek *et al.*, 2006; Morton-Firth and Bray, 1998; Shimizu and Bray, 2001). Below, we present a generalization of the StochSim simulation procedure that incorporates the representational conventions of BNGL. The StochSim method, a null-event kinetic Monte Carlo (KMC) method, has sometimes been characterized as an approximate method; it is not. Properly parameterized, it produces the same statistical distribution of events as a rejection-free KMC method, such as Gillespie's method (for an informative review of null-event and rejection-free KMC methods; see Chatterjee and Vlachos, 2007).

### 2.1 Representational conventions

Molecules and molecular complexes are represented by graphs, and molecular interactions or reaction types are represented by graph-rewriting rules. Graphs are comprised of nodes, labels associated with nodes and edges that connect nodes. Nodes represent the reactive molecular components of a system (e.g. sites and domains of proteins), which are tracked individually during a simulation. In other words, each node is associated with a unique index. Components may be associated with multiple internal states (e.g. a tyrosine residue may be labeled as either phosphorylated or unphosphorylated). Labels give the names of components and their internal states. Edges represent bonds between components. A graph-rewriting rule identifies the necessary and sufficient properties of reactants in a particular type of reaction, the products that result from this type of reaction given a set of reactants, and a rate law for reactions defined by the rule. In the simulation procedure presented here, the rate law associated with a rule is used to determine the probability that a reaction occurs within a given fixed time step. We assume that rate laws associated with rules characterize elementary reactions. In short, we use standard BNGL to represent molecules and molecular interactions with the added feature of unique node indices. These indices are used only for tracking purposes in the simulation procedure described below, and the tracking index associated with a node does not affect its reactivity in any way.

### 2.2 Algorithm

The simulation procedure is comprised of the following steps, which are repeated until a specified stopping criterion is satisfied.

- The current time is incremented by a fixed time step Δ
*t*, during which at most one reaction is allowed to occur. Selection of Δ*t*is discussed below in Section 2.5. The value of Δ*t*determines the resolution with which one can determine when events occur. With each time increment, a reaction is attempted as follows. - A decision is made to select one or two nodes from among the nodes representing reactive components in the system of interest. [We focus on reactions that affect the connection(s) and/or internal state(s) of one or two nodes. It is straightforward to extend the method as presented here to enable consideration of additional types of reactions, such as termolecular reactions and synthesis, degradation and transport reactions.] One node is selected with probability π
_{1}, and two nodes are selected with probability π_{2}=1−π_{1}. Selection of π_{1}is discussed below in Section 2.4. - Depending on the outcome of the previous step, either a single node or a set of two nodes is selected randomly. Each node is selected with uniform probability. The selection procedure is such that the same node can be selected twice, but if a node is selected twice, the following steps are skipped and a null event is said to occur during the time step.
- The selected node or pair of nodes is checked against all reaction rules specified for the reaction system of interest to determine if the node or pair of nodes qualifies as a reactant or set of co-reactants in one or more types of reactions defined by the rules. This procedure often requires that the local environment of a selected node be compared against information included in a rule, as described elsewhere (Blinov
*et al.*, 2006), to determine if the environment is permissive for reaction. For example, if phosphorylation of a tyrosine depends on co-localization with a kinase, the presence or absence of the kinase in the local environment of the tyrosine must be established before one can determine whether the tyrosine can be phosphorylated. At the end of this step, a list of possible reactions has been identified, including potential reactants. If no reaction is possible, the following steps are skipped and a null event occurs. - Each possible reaction is labeled with an integer index
*r*[1,…,*M*_{j}], where*M*_{j}is the number of possible reactions for iteration*j*of the simulation procedure, and a probability*p*_{r}is calculated for each reaction. The calculation of reaction probabilities is discussed below in Section 2.3. - A uniform deviate ρ(0, 1) is generated and used to determine which, if any, of the
*M*_{j}possible reactions occurs by finding the smallest integer*R*≤*M*_{j}that satisfies , where*p*_{r}is the probability calculated in Step 5 for reaction*r*. If no value of*R*satisfies the inequality, the following step is skipped and a null event occurs. - The graph-rewriting operation of the rule defining the reaction with index
*R*is applied to the graph(s) representing the reactant(s) identified in Step 4. The graph-rewriting process has been described elsewhere (Blinov*et al.*, 2006).

### 2.3 Calculation of the probability of accepting a reaction

If a rule indicates that a set of nodes selected in Step 3 of the simulation procedure (see Section 2.2) can participate in a reaction *r* (e.g. a rule indicates that a selected pair of nodes can be connected by an edge or a rule indicates that a selected single node can change its internal state), the reaction is accepted with probability *p*_{r} (Step 6). In general, the value of *p*_{r} is chosen such that *p*_{r}*P*_{r}/Δ*t*=*v*_{r}*N*_{A}*V*, where *v*_{r} is the rate law (inherited from the governing rule) that gives the molar rate of reaction *r* (in units such as Ms^{−1}), *N*_{A} is Avogadro's number, *V* is the volume of the reaction system and *P*_{r} is the probability of selecting a set of nodes that can participate in reaction *r* in Step 3 of the simulation procedure. In other words, *p*_{r} is chosen such that the expected number of reactants consumed in reaction *r* per time step as a result of applying the simulation procedure, which is given by *p*_{r}*P*_{r}/Δ*t*, matches the corresponding physicochemical turnover rate, which is given by *v*_{r}*N*_{A}*V*.

Shimizu and Bray (2001) presented a derivation of expressions that can be used to determine the acceptance probabilities that should be used for two special types of reactions. Following the approach of Shimizu and Bray (2001), we can derive similar expressions for other types of reactions. Below, we give expressions used by DYNSTOC for four types of reactions: (I) state-change reactions, (II) bimolecular association reactions, (III) dissociation reactions and (IV) unimolecular association reactions (i.e. reactions in which two parts of the same molecule or molecular complex are connected). We will refer to these types of reactions as Types I–IV. Below, we will assume that these reactions are characterized by mass–action rate laws.

The probability of accepting a state-change reaction *r*, which we will denote as*p*^{r}_{I}, is given by

where π(0, 1) is a uniform deviate generated in Step 2 of the simulation procedure, κ^{r}_{1}=*k*^{r}_{I}, *k*^{r}_{I} is a rate constant (with units such as s^{−1}) that characterizes the rate at which a component changes state through reaction *r*, *n* is the total number of components (nodes) in the reaction system of interest and Δ*t* is the fixed time step used in the simulation procedure. Recall that π_{1}=1−π_{2} is the probability of deciding to inspect a single node (versus a pair of nodes) in Step 2 of the simulation procedure. Thus, according to Equation (1), Type I reactions are fired only when the decision is made to select a single node in Step 2 of the simulation procedure.

The probability of accepting a bimolecular association reaction *r*, which we will denote as *p*^{r}_{II}, is given by

where π(0, 1) is a uniform deviate, κ^{r}_{2}=*k*^{r}_{II}/(*N*_{A}*V*) and *k*^{r}_{II} is a rate constant (with units such as M^{−1} s^{−1}) that characterizes the rate at which two (distinct) components associate through reaction *r*. The factor of 2 in the denominator of the expression for *p*^{r}_{II} arises because there are two ways (ordered sequences) in which a pair of nodes representing reactive components can be selected in Step 3 of the simulation procedure.

The probability of accepting a dissociation reaction *r*, which we will denote as *p*^{r}_{III}, is given by the right-hand side of Equation (1), except with κ^{r}_{1} redefined as follows: κ^{r}_{-}=*k*^{r}_{III}/2, where *k*^{r}_{III} is a rate constant (with units such as s^{−1}) that characterizes the rate at which two (distinct) components dissociate through reaction *r*. Note that a dissociation reaction can be uniquely identified in two ways: by selecting either of the two components that are bound to each other when the choice is made in Step 2 of the simulation procedure to inspect a single node (π≤π_{1}) or by selecting two mutually bound components when the choice is made to select a pair of components in Step 2 of the simulation procedure (π>π_{1}). For the latter case, we arbitrarily set *p*^{r}_{III}=0 because it is more efficient to identify reactions through selection of a single node than a pair of nodes. For the former case, the factor of that multiplies *k*^{r}_{III} in the expression for κ^{r}_{1} above arises because there are two ways that a single node can be selected to uniquely identify a dissociation reaction.

The probability of accepting a unimolecular (intramolecular) association reaction *r* (e.g. a reaction that connects two ends of a polymer chain to form a ring), which we will denoted as *p*^{r}_{IV}, is given by the right-hand side of Equation (2), except with κ^{r}_{2} redefined as follows: κ^{r}_{2}=*k*^{r}_{IV}, where *k*^{r}_{IV} is a rate constant (with units such as s^{−1}) that characterizes the rate at which two (distinct) components of the same molecule or molecular complex associate through reaction *r*. The factor of 2 in the denominator of the expression for *p*^{r}_{IV} arises because there are two ways (ordered sequences) in which a pair of nodes representing reactive components can be selected. We arbitrarily set *p*^{r}_{IV}=0 when only a single node is inspected in Step 2 of the simulation procedure, because in general, selection of a single node is insufficient to uniquely identify a reaction of this type.

The above expressions for acceptance probabilities are provided for purposes of illustration. One can easily derive additional expressions from the general relation *p*_{r}*P*_{r}/Δ*t*=*v*_{r}*N*_{A}*V* by following the approach of Shimizu and Bray (2001). DYNSTOC is capable of handling the most common reaction types that can be defined using BNGL—see the examples available at the DYNSTOC web site. An error message is produced if DYNSTOC encounters a reaction type that it does not recognize.

### 2.4 Selection of the number of nodes to inspect

In Step 2 of the simulation procedure (see Section 2.2), a decision is made to inspect either one or two nodes. This decision depends on the value chosen for π_{1} and it determines the types of reactions that are subsequently considered in the simulation procedure. We will refer to reactions that are considered after the selection of a single node as Type 1 reactions. Likewise, we will refer to reactions that are considered after the selection of a pair of nodes as Type 2 reactions. Let us assume that both Type 1 and Type 2 reactions are possible; otherwise, we can set π_{1}=0 or 1 trivially. The value of π_{1} may be set arbitrarily, but for efficiency, the value should be chosen to minimize null events. Following the approach of Morton-Firth and Bray (1998), we adopt the strategy of setting algorithmic parameters so that *p*_{1}^{max} =*p*_{2}^{max} =1 to minimize null events, where

for *i*[1, 2]. In Equation (3), *p*_{1}^{max}(*p*_{2}^{max}) denotes the largest probability of accepting a Type 1 (2) reaction at any step during a simulation, and accordingly, κ_{i}^{max} denotes the maximal value of ∑_{r=1}^{Mj} κ_{i}^{r} that can be calculated for a list of reactions identified in Step 4 of the simulation procedure for any iteration j of the procedure in which a decision is made in Step 2 to inspect *i*[1, 2] nodes. In other words, to minimize the occurrence of null events, we normalize the largest cumulative probability of selecting a reaction in Step 6, regardless of whether one or two nodes are inspected in Step 3.

By setting *p*_{1}^{max} =*p*_{2}^{max} and then using Equation (3) and the relation π_{2}=1−π_{1}, we find the following expression for the optimal value of π_{1}:

Note that this expression does not depend on the time step Δ*t*. Also note that the values of κ_{1}^{max} and κ_{2}^{max} depend on the connectivity of the reaction network being simulated as well as the factors, such as rate constants, indicated in Section 2.3.

By introducing the concept of pseudo nodes to set the value of (Morton-Firth and Bray, 1998), we can ensure that only three random numbers are generated in the simulation procedure for each time step instead of three or four. In this approach, π_{1} is related to the number of pseudo nodes, *n*_{0}, as follows: π_{1}=*n*_{0}/(*n*+*n*_{0}). A suboptimal value of π_{1} is likely to result from this procedure because *n*_{0} is necessarily an integer. In this case, the optimal time step (given below) must be reduced. For example, as can be easily confirmed, if π_{1} is less than the optimal value given by Equation (4) by a factor of 1−, where 0<<1, then the optimal time step must be reduced by the same factor. Note that 1− is the relative error introduced by using an integer pseudo node count that gives a value for π_{1} less than that given by Equation (4).

### 2.5 Selection of the time step

In Gillespie's method, the time step is sampled from an exponential distribution and a reaction occurs at each time step. In contrast, DYNSTOC (as well as StochSim) uses a fixed time step and rejection sampling. In other words, this approach introduces null events, i.e. time steps in which no reactions occur. The time step must be carefully chosen. If the time step is too large, more than one reaction is likely to occur (in the physical system) during a step and the accuracy of simulation is degraded. If the time step is too small, accuracy is ensured but at the expense of computational efficiency, because the cost of null events is wasted. These events do not change the state of a system.

When is given by Equation (4), we can use *p*_{1}^{max} =1 and Equation (3) to find the largest time step Δ*t* that can be used in a simulation without introducing error:

It should be noted that, although in this equation does not explicitly depend on the choice of, Equation (5) is valid only if is given by Equation (4).

DYNSTOC uses Equations (4) and (5) to automatically set the values of π_{1} and Δ*t* on the basis of estimated values of κ_{1}^{max} and κ_{2}^{max}, which are initially obtained from an inspection of the rules to be used in a simulation. The cumulative probability of accepting a reaction in Step 6 of the simulation procedure is checked at every iteration of the procedure and if this quantity is ever found to be greater than 1, DYNSTOC generates an error message reporting how Δ*t* should be manually rescaled to normalize the cumulative probability.

## 3 DEMONSTRATION AND VALIDATION

To validate DYNSTOC, we simulated a number of rule-based models and compared the results against those obtained using BioNetGen (data not shown). BNGL-encoded specifications of these models, which can be processed by both DYNSTOC and BioNetGen, are available at the DYNSTOC web site. Below, we further validate DYNSTOC by considering two challenging test-case models that cannot be simulated using BioNetGen (except in special cases) but can be simulated using DYNSTOC and independent problem-specific approaches. These test cases demonstrate the ability of DYNSTOC to simulate multisite phosphorylation and multivalent binding dynamics.

### 3.1 Test case I: multisite phosphorylation

We consider an idealized rule-based model for a system in which autophosphorylation of a receptor tyrosine kinase (RTK) can generate a multitude of receptor phosphoforms and phosphorylation-dependent adapter-bound receptor states. The model captures the interactions of a cytosolic adapter protein with a dimer of RTKs, which are tightly associated. The adapter protein is comprised of a Src homology 2 (SH2) domain, and each receptor in a dimer is comprised of an active catalytic subunit and *n* autophosphorylation sites. When a site is phosphorylated, it can bind the SH2 domain of an adapter protein. The rules of the model are given in pseudo BNGL as follows:

where ‘D’ denotes a receptor dimer, ‘*Y*_{ij}’ (*i*=1,…,*n*; *j*=1, 2) denotes the *i*-th tyrosine of the *j*-th receptor in a receptor dimer, ‘U’ denotes an unphosphorylated tyrosine, ‘P’ denotes a phosphorylated tyrosine, ‘A’ denotes an adapter protein and ‘SH2’ denotes the SH2 domain of an adapter protein. As usual in BNGL, the internal state label of a molecular component is prefixed by a tilde and a bond name is prefixed by an exclamation mark. Sharing of a bond name indicates that two molecular components are connected. The plus sign on the left-hand side of Equation (6c) indicates that the molecularity of reactions defined by this rule is two. The absence of a plus sign on the right-hand side indicates that reverse reactions have a molecularity of one. Reactions defined by the rules of Equations (6a) and (6b) also have a molecularity of one. The period on the right-hand side of Equation (6c) indicates joint membership in a complex. The above rules represent autophosphorylation of receptor tyrosines [Equation (6a)], dephosphorylation of receptor tyrosines via phosphatases not explicitly included in the model [Equation (6b)], and reversible adapter–receptor binding via SH2 domain recognition of phosphotyrosine [Equation (6c)]. An illustration of the model of Equations (6a-c) is available at the DYNSTOC web site.

The total number of rules defined in Equations (6a-c) is 6*n*, and as can easily be confirmed, the number of chemical species implied by a rule set, *N*, is given by

In this count, one species corresponds to free adapter protein, A(SH2); 3^{n} species correspond to symmetric complexes and the rest correspond to asymmetric complexes. Note that a receptor has 3^{n} possible states because each of its *n* tyrosines has three possible states: free and unphosphorylated or phosphorylated and phosphorylated and bound. Thus, the rules of Equations (6a-c) tend to imply a large reaction network. However, because each receptor tyrosine is independent, this network can be characterized by a number of coupled ODEs derived from the law of mass action that is much less than *N* (Borisov *et al.*, 2005; Conzelmann *et al.*, 2006). As we will see, DYNSTOC is able to simulate Equations (6a-c) without taking advantage of this simplifying insight, and we can compare the results against those obtained from the reduced system of ODEs, which is given below.

For *i*=1,…, *n* and *j*=1, 2, we can write the following mass-action equations:

where [*U*_{ij}] is the concentration of the *i*-th tyrosine in the *j*-th receptor in unphosphorylated form, [*P*_{ij}] is the concentration of the *i*-th tyrosine in the *j*-th receptor in phosphorylated form and unbound to adapter, [*AP*_{ij}] is the concentration of adapter protein bound to the *i*–th tyrosine in the *j*-th receptor, [*A*] is the concentration of free adapter protein, _{i} is the apparent first-order rate constant for autophosphorylation of the *i*-th tyrosine in a receptor (we assume that autophosphorylation is substrate limited), δ_{i} is the apparent first-order rate constant for dephosphorylation of the *i*-th tyrosine in a receptor (we assume that phosphatases are present in excess), *k*_{+i} is the rate constant for binding of the adapter protein to the *i*-th tyrosine in a receptor, and *k*_{−i} is the rate constant for dissociation of the adapter protein from the *i*-th tyrosine in a receptor. If we assume that mass is conserved on the time scale of interest, we can also write the following equation:

where [*A*_{T}] is total concentration of adapter protein. Equations (8a-c) and (9) can be solved using standard numerical methods to obtain results that can be compared against DYNSTOC simulation results.

### 3.2 Test case II: multivalent binding

We consider an idealized rule-based model for a system in which multivalent ligand–receptor binding can generate a multitude of ligand-induced receptor aggregates. The model captures the interactions of a soluble trivalent ligand with a mobile cell-surface bivalent receptor (Yang *et al.*, 2008). The rules of this model are given in BNGL as follows:

where ‘L’ denotes a ligand, ‘R’ denotes a receptor, ‘r’ denotes one of three identical binding sites on a ligand and ‘l’ denotes one of two identical binding sites on a receptor. The above rules represent capture of free ligand [Equation (10a)], receptor cross-linking [Equation (10b)] and ligand–receptor dissociation [Equation (10c)]. The model is relevant for studying several experimental systems (Bilgiçer *et al.*, 2007; Posner *et al.*, 2002, 2007; Sil *et al.*, 2007). An illustration of the model of Equations (10a-c) is available at the DYNSTOC web site.

A large number of acyclic receptor aggregates can arise through the interactions represented by the rules of Equations (10a-c), and as a result, this model, depending on parameter values, can be difficult or impossible to simulate with conventional approaches (Hlavacek *et al.*, 2006; Yang *et al.*, 2008). Moreover, correct simulation of Equations (10a-c) requires enforcement of the ‘+’ constraint of Equation (10b), which prohibits the formation of cyclic aggregates (Yang *et al.*, 2008). This type of constraint on molecularity is difficult and sometimes expensive to enforce (Yang *et al.*, 2008), but the need to enforce such a constraint is common, especially for rules that characterize aggregation phenomena.

Here, we will demonstrate how DYNSTOC simulations of Equations (10a-c) can be used to study the system of Posner *et al.* (2002). In particular, we will attempt to make a connection between cell-surface binding events and the cellular response to these events. The components of the system of Posner *et al.* (2002) include RBL cells, which express FcRI (the high-affinity cell-surface receptor for IgE antibody), a model antigen containing three symmetrically arrayed 2,4-dinitrophenol (DNP) hapten groups, and a bivalent monoclonal anti-DNP IgE antibody. The trivalent antigen interacts with anti-DNP IgE–FcRI complexes on RBL cells, which are long lived, to stimulate a robust cellular secretory response (Posner *et al.*, 2002). Aggregation of FcRI is known to trigger signaling events that can lead to degranulation (Metzger, 1992). We will assume, as in earlier work (Dembo and Goldstein, 1978), that the cellular secretory response to ligand correlates with the number of receptors in ligand-induced receptor aggregates at steady state. However, we will assume that receptor dimers are non-stimulatory because the bivalent analog of the trivalent ligand of Posner *et al.* (2002) does not elicit a secretory response. As shown in Figure 1, we can find values for parameters in the model of Yang *et al.* (2008), which we will call the TLBR model, such that ligand-induced receptor aggregation correlates with the secretory response to ligand. Below, we will use these parameter values to further investigate cell-surface ligand–receptor interactions.

### 3.3 Validation of simulation results

The results of simulations obtained using DYNSTOC and independent methods can be compared in Figure 2. The simulation results of Figure 2A were obtained by using DYNSTOC to process the rules of Equations (6a-c) and by numerical integration of Equations (8a-c) and (9) for *n*=6 [*N*=266 086; Equation (7)]. Interestingly, the time courses of Figure 2A indicate that ordered phosphorylation of tyrosines is possible for purely kinetic reasons. The simulation results of Figure 2B were obtained by using DYNSTOC to process the rules of Equations (10a-c) and by using a problem-specific implementation of the (general) method of Yang *et al.* (2008), which we will call the YMFH method, to process the same rule set. As can be seen, DYNSTOC produces results that are consistent with those obtained independently. Interestingly, the time courses of Figure 2B suggest that receptor trimers are predominantly responsible for the RBL secretory response to the trivalent ligand of Posner *et al.* (2002). The results of Figure 2 serve not only to validate DYNSTOC, but also to demonstrate that DYNSTOC is capable of simulating large-scale rule-based models.

### 3.4 Efficiency

Figure 3 illustrates the efficiency of DYNSTOC relative to the YMFH method. Both methods were used to simulate the rules of Equations (10a-c) over ranges of model parameter values that affect the complexity of simulation. Increasing the number of receptors *N*_{R} (Fig. 3A) increases the frequency of reactions, and increasing the dimensionless parameter β=*N*_{R}*k*_{+2}/*k*_{off} (Fig. 3B) increases the average size of ligand-induced receptor aggregates at equilibrium (Goldstein and Perelson, 1984). As can be seen, the computational cost of simulating Equations (10a-c) with DYNSTOC scales as a function of simulation complexity similarly to the YMFH method. However, the cost of DYNSTOC simulations can be orders of magnitude greater. The results of Figure 3 are not unexpected, because time steps in the method of DYNSTOC are fixed and must generally be smaller than in the YMFH method, which involves sampling of time steps from an exponential distribution, as in the method of Gillespie (2007). It should be emphasized that DYNSTOC is a general-purpose simulation tool, whereas the implementation of the YMFH method being considered here is problem specific. It should also be remembered that simulation of the rules of Equations (10a-c) is impractical with conventional methods (Hlavacek *et al.*, 2006; Yang *et al.*, 2008).

## 4 CLOSING REMARKS

Languages, such as BNGL, provide a means for the specification of kinetic models for signal-transduction and other biochemical systems in terms of rules for molecular interactions. However, because of combinatorial complexity (Hlavacek *et al.*, 2003, 2006), the biochemical reaction networks implied by typical rule sets are large scale. The process of deriving a network from rules is expensive, and rule-derived networks (if they can be practically generated from rules) are difficult to simulate using conventional ODE-based and stochastic simulation approaches. To address this problem, we have generalized the agent-based simulation method of StochSim and implemented the generalized method in software called DYNSTOC, which can interpret BNGL-encoded model specifications.

The generalized simulation method presented here differs from the original StochSim method in a number of ways. For example, model specification and reporting of simulation results are greatly eased by the ability to use BNGL. However, the main advance is reformulation of the StochSim method to enable explicit tracking of the connectivity of molecules in molecular complexes. This advance is enabled by the use of graphs to represent molecules and molecular complexes. The generalized method is also capable of accounting for a richer variety of reaction types, such as intramolecular association reactions, which, with the exception of special cases, cannot be considered within the original StochSim framework.

The results of Figure 3 suggest that the StochSim/DYNSTOC simulation method is less efficient than the YMFH method. These results provide motivation for development of general-purpose software implementing the YMFH method, and DYNSTOC should be useful for validating such software when available. Although other simulation methods may be more efficient, DYNSTOC should still be useful for simulating a wide variety of biochemical systems, and we find the capabilities of DYNSTOC demonstrated here to be quite exciting.

## ACKNOWLEDGEMENTS

We thank Guy Yosiphon for helpful discussions.

*Funding:* National Institutes of Health (AI35997, CA109552, GM076570 and RR18754); Department of Energy (DOE) contract DE-AC52-06NA25396; Arizona Biomedical Research Commission.

*Conflict of Interest:* none declared.

## References

- Andrei O, Kirchner H. Graph rewriting strategies for modeling biochemical networks. In: Negru V, et al., editors. Proceedings of the Ninth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. Piscataway, NJ: IEEE; 2008. pp. 407–414.
- Bilgiçer B, et al. A synthetic trivalent hapten that aggregates anti-2,4-DNP IgG into bicyclic trimers. J. Am. Chem. Soc. 2007;129:3722–3728. [PMC free article] [PubMed]
- Blinov ML, et al. BioNetGen: software for rule-based modeling of signal transduction based on the interactions of molecular domains. Bioinformatics. 2004;20:3289–3291. [PubMed]
- Blinov ML, et al. Graph theory for rule-based modeling of biochemical networks. Lect. Notes Comput. Sci. 2006;4230:89–106.
- Borisov NM, et al. Signaling through receptors and scaffolds: independent interactions reduce combinatorial complexity. Biophys. J. 2005;89:951–966. [PMC free article] [PubMed]
- Chatterjee A, Vlachos DG. An overview of spatial microscopic and accelerated kinetic Monte Carlo methods. J. Comput. Aided Mater. Des. 2007;14:253–308.
- Conzelmann H, et al. A domain-oriented approach to the reduction of combinatorial complexity in signal transduction networks. BMC Bioinformatics. 2006;7:34. [PMC free article] [PubMed]
- Danos V. Agile modelling of cellular signalling. AIP Conf. Proc. 2007;963:611–614.
- Danos V, Laneve C. Formal molecular biology. Theor. Comput. Sci. 2004;325:69–110.
- Danos V, et al. Rule-based modelling of cellular signalling. Lect. Notes Comput. Sci. 2007a;4703:17–41.
- Danos V, et al. Scalable simulation of cellular signaling networks. Lect. Notes Comput. Sci. 2007b;4807:139–157.
- Dembo M, Goldstein B. Theory of equilibrium binding of symmetric bivalent haptens to cell surface antibody: application to histamine release from basophils. J. Immunol. 1978;121:345–353. [PubMed]
- Dematté L, et al. The BlenX language: a tutorial. Lect. Notes Comput. Sci. 2008;5016:313–365.
- Erickson J, et al. The effect of receptor density on the forward rate constant for binding of ligands to cell surface receptors. Biophys. J. 1987;52:657–662. [PMC free article] [PubMed]
- Faeder JR, et al. Rule-based modeling of biochemical networks. Complexity. 2005a;10:22–41.
- Faeder JR, et al. Graphical rule-based representation of signal-transduction networks. In: Liebrock LM, editor. Proceeings of the 2005 ACM Symposium on Applied Computing. New York: ACM Press; 2005b. pp. 133–140.
- Faeder,J.R. rt.al. (in press) Rule-based modeling of biochemical systems with BioNetGen. Methods Mol. Biol. [PubMed]
- Gillespie DT. Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 2007;58:35–55. [PubMed]
- Goldstein B, Perelson AS. Equilibrium theory for the clustering of bivalent cell surface receptors by trivalent ligands. Application to histamine release from basophils. Biophys. J. 1984;45:1109–1123. [PMC free article] [PubMed]
- Hlavacek WS, et al. The complexity of complexes in signal transduction. Biotechnol. Bioeng. 2003;84:783–794. [PubMed]
- Hlavacek WS, et al. Rules for modeling signal-transduction systems. Sci. STKE. 2006;2006:re6. [PubMed]
- Kitano H, et al. Using process diagrams for the graphical representation of biological networks. Nat. Biotechnol. 2005;23:961–966. [PubMed]
- Le Novère N, Shimizu TS. StochSim: modelling of stochastic biomolecular processes. Bioinformatics. 2001;17:575–576. [PubMed]
- Li H, et al. Algorithms and software for stochastic simulation of biochemical reacting systems. Biotechnol. Prog. 2008;24:56–61. [PMC free article] [PubMed]
- Lok L, Brent R. Automatic generation of cellular reaction networks with Moleculizer 1.0. Nat. Biotechnol. 2005;23:131–136. [PubMed]
- Metzger H. Transmembrane signaling: the joy of aggregation. J. Immunol. 1992;149:1477–1487. [PubMed]
- Mjolsness E, Yosiphon G. Stochastic process semantics for dynamical grammars. Ann. Math. Artif. Intell. 2006;47:329–395.
- Morton-Firth CJ, Bray D. Predicting temporal fluctuations in an intracellular signalling pathway. J. Theor. Biol. 1998;192:117–128. [PubMed]
- Mu F, et al. Carbon-fate maps for metabolic reactions. Bioinformatics. 2007;23:3193–3199. [PubMed]
- Posner RG, et al. A quantitative approach for studying IgE-FcRI aggregation. Mol. Immunol. 2002;38:1221–1228. [PubMed]
- Posner RG, et al. Trivalent antigens for degranulation of mast cells. Org. Lett. 2007;9:3551–3554. [PMC free article] [PubMed]
- Schulze TP. Efficient kinetic Monte Carlo simulation. J. Comput. Phys. 2007;227:2455–2462.
- Shimizu TS, Bray D. Computational cell biology—the stochastic approach. In: Kitano H, editor. Foundations of Systems Biology. Cambridge, MA: MIT Press; 2001. Ch. 10.
- Sil D, et al. Trivalent ligands with rigid DNA spacers reveal structural requirements for IgE receptor signaling in RBL mast cells. ACS Chem. Biol. 2007;2:674–684. [PMC free article] [PubMed]
- Slepoy A, et al. A constant-time kinetic Monte Carlo algorithm for simulation of large biochemical reaction networks. J. Chem. Phys. 2008;128:205101. [PubMed]
- Xu K, et al. Kinetics of multivalent antigen DNP-BSA binding to IgE-FcRI in relationship to the stimulated tyrosine phosphorylation of FcRI. J. Immunol. 1998;160:3225–3235. [PubMed]
- Yang J, et al. Kinetic Monte Carlo method for rule-based modeling of biochemical networks. Phys. Rev. E. 2008;78:031910. [PMC free article] [PubMed]

**Oxford University Press**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (184K)

- RuleMonkey: software for stochastic simulation of rule-based models.[BMC Bioinformatics. 2010]
*Colvin J, Monine MI, Gutenkunst RN, Hlavacek WS, Von Hoff DD, Posner RG.**BMC Bioinformatics. 2010 Jul 30; 11:404. Epub 2010 Jul 30.* - The efficiency of reactant site sampling in network-free simulation of rule-based models for biochemical systems.[Phys Biol. 2011]
*Yang J, Hlavacek WS.**Phys Biol. 2011 Oct; 8(5):055009. Epub 2011 Aug 10.* - Rules for modeling signal-transduction systems.[Sci STKE. 2006]
*Hlavacek WS, Faeder JR, Blinov ML, Posner RG, Hucka M, Fontana W.**Sci STKE. 2006 Jul 18; 2006(344):re6. Epub 2006 Jul 18.* - Rule-based modeling of biochemical systems with BioNetGen.[Methods Mol Biol. 2009]
*Faeder JR, Blinov ML, Hlavacek WS.**Methods Mol Biol. 2009; 500:113-67.* - Rule-based modeling of signal transduction: a primer.[Methods Mol Biol. 2012]
*Sekar JA, Faeder JR.**Methods Mol Biol. 2012; 880:139-218.*

- Exact Hybrid Particle/Population Simulation of Rule-Based Models of Biochemical Systems[PLoS Computational Biology. ]
*Hogg JS, Harris LA, Stover LJ, Nair NS, Faeder JR.**PLoS Computational Biology. 10(4)e1003544* - A predictive mathematical model of the DNA damage G2 checkpoint[Journal of theoretical biology. 2013]
*Kesseler KJ, Blinov ML, Elston TC, Kaufmann WK, Simpson DA.**Journal of theoretical biology. 2013 Mar 7; 32010.1016/j.jtbi.2012.12.011* - Specification, annotation, visualization and simulation of a large rule-based model for ERBB receptor signaling[BMC Systems Biology. ]
*Creamer MS, Stites EC, Aziz M, Cahill JA, Tan CW, Berens ME, Han H, Bussey KJ, Von Hoff DD, Hlavacek WS, Posner RG.**BMC Systems Biology. 6107* - A computational model for early events in B cell antigen receptor signaling: analysis of the roles of Lyn and Fyn[Journal of Immunology (Baltimore, Md. : 195...]
*Barua D, Hlavacek WS, Lipniacki T.**Journal of Immunology (Baltimore, Md. : 1950). 2012 Jul 15; 189(2)646-658* - Leveraging Modeling Approaches: Reaction Networks and Rules[Advances in Experimental Medicine and Biolo...]
*Blinov ML, Moraru II.**Advances in Experimental Medicine and Biology. 2012; 736517-530*