NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Madame Curie Bioscience Database [Internet]. Austin (TX): Landes Bioscience; 2000-2013.

Cover of Madame Curie Bioscience Database

Madame Curie Bioscience Database [Internet].

Show details

Foundations of E-Cell Simulation Environment Architecture

and *.

* Corresponding Author: Koichi Takahashi—Molecular Sciences Institute, Berkeley, CA, USA. Email:gro.icslom@ihsahakatk

Introduction

The thorough overview of the E-Cell Simulation Environment in this chapter provides a foundation for understanding the systems biology research that uses the E-Cell Simulation Environment presented within this book. To begin this inquiry, we open with the most general question possible: what is the E-Cell Simulation Environment? The answer is that the E-Cell Simulation Environment (commonly abbreviated E-Cell SE, or even SE) is a simulator of cellular systems models. It is the primary component of a three-program software platform, collectively called the E-Cell System, for creating, simulating and analyzing biological models. As the simulator in this larger environment, the E-Cell Simulation Environment takes user-defined abstract model descriptions, translates them into its own internal model format and calculates trajectories of those models through time, either by recording the results in a file for future analysis, or in real time where the model state can be viewed or modified by the user at any point during execution.

For any simulator, two of the most relevant questions are about the type of system the program models and the algorithms the program can use in performing the modeling. As stated above, the E-Cell System was created to model and simulate cellular systems, but this is not the complete story. The E-Cell System generally and the E-Cell Simulation Environment specifically are fundamentally generic modeling platforms. While they come specialized “out of the box” for cellular modeling, they can simulate any mathematical model. What does this mean? It means that whenever a system can be described formally as a set of variables interacting through mathematical relationships such as equations, relations and constraints, that model can be expressed (and then simulated) in a natural way as an E-Cell Simulation Environment model. E-Cell SE can simulate any mathematical model, no matter what types of mathematical relationships that model describes, or the combinations in which those relationships occur.

The following simple example illustrates how the E-Cell Simulation Environment is more generic than the average biochemical simulator. To model a biochemical system containing three chemical species—A, B and C, such that A and B react to form C with some observed rate—there are two (many more than two, actually) ways to proceed, depending upon the interpretation of the verb “react”. You can define it using a differential equation that states the quantities involved convert from one to another at a rate proportional to the product of the concentrations of the reactants: a mass-action reaction. A second way to describing a “reaction” would be as a Gillespie Process, which states that A and B react to form C in atomic jumps corresponding to individual reaction events, where the times of those events are calculated by sampling exponential distributions that depend on the population numbers of A and B and whose action is to decrease the value of each of A and B by one and increase the value of C by one. These two models are distinct and equally valid mathematical descriptions of the described physical system. While most biological simulators use either mass-action equations or Gillespie equations for describing systems (that is, they typically are built around a single type of algorithm), the E-Cell Simulation Environment can use either. Furthermore, what really makes the E-Cell Simulation Environment generic is its extensibility: any mathematical description of what the word “reacts” might mean can be translated into computer instructions and then used within E-Cell model files.*

To understand how E-Cell Simulation Environment works, this chapter examines how the E-Cell SE allows for any mathematical model to be expressed as an E-Cell model capable of being simulated and also the architecture used to make this possible.

Although the E-Cell Simulation Environment is a generic simulator, the E-Cell Simulation Environment comes packaged with many features, including a toolkit of algorithms commonly used in the field, that put the focus of the E-Cell Simulation Environment on cellular modeling and simulation.

The key to understanding how the generic core of the E-Cell Simulation Environment works is in the intersection between model syntax, model semantics and algorithm implementation. The basic idea is that within the E-Cell Simulation Environment, all processes that update variable values (these correspond to mathematical relationships within the model, for example, a single mass-action equation) are defined in terms of the same internal interface, called the Process interface. (An object possessing such an interface in the SE environment is called a Process. Whenever the word is capitalized, it refers to an algorithm that has this interface in E-Cell SE.) This interface supports reading variable values within the simulation environment followed by instantaneously updating either some variable values or some variable derivatives. Because any algorithm used for systems modeling can be described as a process that updates certain variable values and velocities given the state of other variables and variable velocities, the E-Cell Simulation Environment is able to treat all conceivable simulation algorithms uniformly, without a priori needing information as to their implementation. The result is that internally to the E-Cell Simulation Environment, models are represented as combinations of variables lists and abstract Processes lists that supply the relationships between those variables. While the E-Cell Simulation Environment treats all algorithms both abstractly and uniformly, when called upon to act (to “fire” in E-Cell SE terminology), each Process has its own individual implementation, which defines the exact behavior of that algorithm. By using different combinations of variables and Processes connecting those variables, users can construct E-Cell models that represent any physical system (or any mathematical model, depending on one's point of view), which can then be simulated in the Simulation Environment. The Process interface defined by E-Cell is the foundation for the entire E-Cell Simulation Environment and makes E-Cell SE both generic and extendible.

To illustrate how E-Cell SE uses the universal Process interface to simulate generic models, let us revisit the above example, with two species that combine into a third, represented using either mass-action equations or Gillespie equations. Because E-Cell SE views models through the lens of the Process interface, the models have an identical structure according to this view. Both possess a list of three variables, A, B and C, and both have one Process that reads and then updates the variable values. The difference in model behavior comes from the implementation of the two different Processes. In the mass-action based model, the Process reads the value of A and B, along with the always-defined E-Cell SE variable called volume. The Process uses this information to calculate the concentrations of A and B and then uses the product to calculate a velocity delta, which it adds to the variable C (recall that a mass-action equation representing the reaction A+B -> C is a differential equation of the form d[C]/dt = k[A] [B]). In the Gillespie system model, the Process uses the volume, the values of A and B, along with a random number, to determine the time at which the values of A and B should decrement by one and the value of C increment by one.

While these two models have different simulation trajectories, the difference is entirely encapsulated within the algorithms' implementation, hidden from the E-Cell Simulation Environment behind identical interfaces. With this system, E-Cell SE only has to be concerned with defining variables, defining the universal Process interface and driving the two. Model builders can implement any algorithms, as long as they follow the rules of the E-Cell Process interface. Then, as they build models, they can do so using any combination of algorithms, simply by invoking them within model files using the names defined concomitant to their implementations. The way the E-Cell Simulation Environment treats algorithms uniformly allows all the semantics of models simulated by the E-Cell Simulation Environment to be defined by the model builders themselves.

The goal is that by combining model building with algorithm implementation, the modeling of any system within the E-Cell Simulation Environment will be as easy and as open ended as directly mathematically modeling the same system. The resultant model structure will ultimately be custom built to the user's needs and not according to the software capabilities.

Hopefully the reader is by now convinced of the veracity of E-Cell SE ability as a generic simulator. And now, with that perspective of the program covered, we move in the other direction and emphasize that the E-Cell Simulation Environment, as well as the E-Cell System, comes structured and organized as a biological simulator of cellular systems. In fact, virtually all literature discussing the E-Cell System discusses it almost exclusively as this type of simulator only. How can this be understood, given the effort just spent discussing how the core E-Cell Simulation Environment is a generic simulator? Although E-Cell SE is a fundametally generic platform, it was created as one part of an ongoing biological project to simulate whole-cell models on computers. Because of this, E-Cell is set up out of the box to support cellular modeling. Common algorithms used within this field are supplied already implemented for use within E-Cell models and need only be called within model files in order to be used within a model. Additionally, the organization of the E-Cell Simulation Environment application—which includes but is not limited to its generic kernel, the component that actually drives the simulation of models—has been developed with the needs of biologists in mind. Its default workflow is applicable to many nonbiological fields and can also be extended or modified as needed for those projects where it is inadequate, but the default configuration of the E-Cell System has been created with the needs of systems biologists and other cellular modelers in mind. This explains the dual generic/specialized nature of the E-Cell System: while the core engine of the E-Cell Simulation Environment is a completely generic and extendible system for simulating arbitrary models, it comes “pre-extended” for cellular and sub-cellular modeling.

Background

At this point, we understand that the E-Cell Simulation Environment is a unique blend of generic and cellular simulation systems. We also have a basic understanding that this blend is possible because of the way the E-Cell Simulation Environment uniformly treats algorithms, which allows users to extend the Simulation Environment with any new algorithm they need to include in their models. Before we cover the architectural details used by the Simulation Environment to make this possible, knowing the background of the E-Cell project will help provide a better understanding of the forces that have forged the E-Cell System and the E-Cell Simulation Environment into the unique form they take today.

The origins of these programs began at Keio University in 1996, with the launch of a biological program called the E-Cell Project that aimed to reconstruct a whole cell in silico. For this project the organism Mycoplasma genitalium, which possesses the smallest known genome, was chosen as the target. One branch of the project consisted of initial work on a simulator that could simulate the whole cell models the project would produce. (This simulator would come to eventually develop into the current version of the E-Cell Simulation Environment, over a period of many years and several major software releases.) The initial work in engineering this program was to perform a meta-study of the field of biological modeling in order to establish a requirements analysis that would determine what features a simulator capable of running whole-cell models would have to possess.This investigation uncovered that, while the simulation of biological cells is similar in many respects to the simulation of many other types of complex systems, cellular systems typically have features that pose unique challenges to their simulation. The specific discovery was that in categorizing systems based on modeling requirements, there are at least three axes of complexity and cellular systems rank particularly high on two of them.

The first type of complexity the E-Cell Project observed was that cellular systems typically have high copy numbers of components, implying that the total number of interactions taking place within a cell during any given time interval is large. This type of complexity is common in the world of simulation. However, the second type of complexity observed within cellular systems by the E-Cell group was less often found in the world of complex systems. This type of complexity is ontological, which means that within most cellular systems the number of distinct interaction types that can be observed is large. Cellular processes including metabolism, signal transduction, gene expression, cytoskeletal dynamics and cytoplasmic streaming are all processes that cause interactions between intracellular components but otherwise differ fundamentally from one another in their behaviors, possessing different properties such as time scales, number of intracellular components affected, global effects on the cell, dynamics, etc.

Ontological complexity poses a challenge to programs that wish to simulate these systems. Because of the variety of intra- and extracellular behaviors, the current state of the art in cellular simulation is a rich ecosystem of algorithms, each representing some aspect of the processes that occur within cellular systems. Note that the alternative to this approach would be to attempt to produce a monolithic “universal” algorithm that would be able to model all these different processes by itself. However at the present time, no algorithm known to the field of cellular modeling can claim to be universal, in the sense that that it could be used to produce accurate simulations of all cellular systems in a timely fashion. For example, it is possible to build whole cell models consisting entirely of systems of mass action equations. While such a model potentially could be efficiently simulated, it probably would not be particularly accurate, because it likely would represent a gross oversimplification of the system. At the other extreme, it would also be possible to build whole cell models by modeling all cellular contents and interactions in terms of Brownian motion, collisions between three-dimensional objects and geometry. But while such a simulation might be very accurate, it would also be simply too large to simulate on any conceivable computer. Given the range of intracellular behaviors, it seems unlikely that a universal algorithm will ever be found; at minimum, such an algorithm does not currently exist. Therefore, to produce serious whole-cell models, the only realistic course is to focus on how to use large sets of algorithms in various combinations to create and simulate our models in order to allow different algorithms to be used where they are best suited.

Given the necessity of using many different algorithms to model whole-cell systems, the E-Cell group made yet another finding. It was observed that the behavior of cellular systems, on both the sub-system and whole-system levels, have highly nonlinear dynamics in all but the most trivial cases. The implication is that no matter which sub-systems of the cell are studied and no matter how well sub-system behaviors are individually understood, that knowledge will form an incomplete framework for understanding the whole system. Such investigations can be very useful in understanding the behavior of whole cells, but in the end cannot be completely explanatory. Some understanding of whole cell behavior can only come through considering the system as a whole. This property made it clear to the E-Cell Project that a strategy of using many different simulators to independently investigate different cellular subsystems would be doomed from the start to be incomplete. The ultimate conclusion reached was that to model and simulate cellular systems, a new type of generic simulator that could use arbitrary algorithms within the same model must be built. With that understanding, the concept of the E-Cell System was conceived.

With the bold goal of building a generic simulator set out during the first phase, work turned to implementation. How could arbitrary algorithms be used within the same simulation framework? This problem was solved by Koichi Takahashi1 with the development of a computational scheme he called the meta-algorithm. The meta-algorithm provides the theoretical solution as to how a model using multiple algorithms can be simulated as a single unit to produce a trajectory of the whole system. The true importance of the meta-algorithm to the success of the E-Cell System can be expressed by noting that the entire core of the E-Cell Simulation Environment is hardly more than an implementation of the meta-algorithm.

The meta-algorithm works by classifying each potential simulation algorithm that can be mathematically defined into two types based on whether they are continuous (equations that represent continually changing quantities) or discrete (equations representing quantities that change at specific moments). For each algorithm in each of these groups, the meta-algorithm records which variables the algorithm reads as input and which it modifies as its output and uses this to calculate a dependency relation amongst the algorithms used within the model: one algorithm is dependent on another if it reads variables that the other modifies. The meta-algorithm then specifies the exact order in which the different algorithms should be used (these are called events) on an initial state as well as how much time should be advanced for each event in order to simulate the whole model.

The meta-algorithm framework provides the platform that resolves both concerns raised above: the need to use many different algorithms and in a combined form. Thus, this platform allows the building of cellular models using appropriate representations at each level of modeling. Put simply, the meta-algorithm makes a platform for generic modeling possible. Although a generic system like this is more difficult to implement and requires a more complicated architecture than a simple simulator that implements only one algorithm, it has many advantages that easily allow it to surpass any such concrete system. For instance, any model that can be run in any specific simulator can be run within E-Cell, because a model using only one formalism is a special case of a multi-algorithm model. A second advantage is that being able to use multiple algorithms encourages modelers to perform their craft in a very natural way: by mentally decomposing systems into sub-systems, modeling the sub-systems individually using appropriate algorithms and then specifying the coupling between the sub-systems to create a whole cell model. Not only is this a very natural and straightforward way to model large systems, but it also allows the sub-systems to be simulated individually with no additional work. This was the first major set of results produced by the E-Cell Project in the direction of whole cell modeling.

Other design considerations made by the E-Cell Project leading up to the construction of the E-Cell Simulation Environment came from the experience of E-Cell Project members as to how biological simulators are ultimately used by systems biology researchers in labs. “In silico” research is usually only one part of a complicated process of biological knowledge creation, involving wet lab experimentation, modeling, simulation and analysis. In these laboratory settings, biological models are built from experimental results and their purpose is to explain and extend those results. In these dynamic environments, each new piece of data and each limitation in the explanatory power of a model, is likely to propagate changes in the model. At the frontiers of biological research, a model representing “best understanding” could be under near constant revision. In addition, the simulation of models must be configurable to accommodate the range of approaches scientists might wish to use as a part of their research. These approaches might include scripting multiple simulation runs with varying inputs, running simulations on parallel or grid-based hardware and investigating models through a graphical user interface where each variable in the model can be looked at and modified at any moment in time. Another aspect of this configurability is that within labs simulators are often used as one link in a chain of software programs; any generic biological simulator must be configurable enough so that it can collect data from and send results to arbitrary data sources. This high level of configurability is critical for a simulator to be useful to a community of researchers, each with different needs.

The E-Cell System

With an initial requirements analysis completed and a theoretical foundation for development laid out in the meta-algorithm, work began on building a specific software system for modeling whole-cell systems. As we now know, the result of this work was a suite of software, called the E-Cell System, which is a complete environment for the modeling, simulation and analysis of complex biological systems. (Fig. 1) The E-Cell System consists of three components: the E-Cell Modeling Environment, which allows for collaborative and distributed modeling of cellular systems, the E-Cell Simulation Environment, which runs simulations of models and the E-Cell Analysis Toolkit, which is composed of a set of scripts for mathematically analyzing the results of E-Cell Simulation Environment simulations.

Figure 1. Overview of the E-Cell System.

Figure 1

Overview of the E-Cell System. The E-Cell System consists of three components, the Modeling Environment, the Simulation Environment and the Analysis Toolkit.

The E-Cell Modeling Environment (also called E-Cell ME) is a computer environment for the modeling of cellular systems. As computer processing speeds increase, along with the quantities of available genomic and proteomic data for any given system, the average size of biological models is constantly increasing. Preparing models by hand is becoming increasingly difficult and will likely become impossible on average in the near future. In order to take advantage of faster computers and additional data, new automated methods of model production must be created, so that computers can be “taught” how to build models by humans, instead of humans doing all the work manually. The E-Cell Modeling Environment is an attempt to meet this need. The E-Cell Modeling Environment is built around the idea that model building occurs in several stages: data collection, data integration and initial editing of the model, which results in an initial approximation of a model of the system. This model is simulated and analysis of the results leads to additional model refinement. The E-Cell Modeling Environment provides tools that address each of these stages and has been created as a generic modeling environment, analogous to the way the E-Cell Simulation Environment is a generic simulation environment.

Once a model has been created using the Modeling Environment and simulated in the Simulation Environment, it must be analyzed either to refine the model, or to learn new facts about the behavior of the system being modeled. For this, the E-Cell System provides the E-Cell Analysis Toolkit, which consists of a series of mathematical scripts that analyze the behavior of a model. For tasks such as model refinement, the E-Cell Analysis Toolkit provides scripts for parameter tuning that help fit a model to some observed system output. For the analysis of already-tuned models, the E-Cell Analysis Toolkit provides scripts for bifurcation analysis, which analyze a model that might have several different behaviors depending on the initial conditions in order to provide boundary conditions on the state space such that different regions lead to one outcome versus the other.

These two programs, along with the E-Cell Simulation Environment, combine to form a complete platform for “in silico” biological research and provide a useful tool for biological researchers in the field.

The Meta-Algorithm

The E-Cell Simulation Environment enables the simulation of models constructed using virtually any combination of continuous and discrete algorithms using a formalism called “the meta-algorithm”, which is a framework in which various simulation algorithms can be run in concert. The importance of the meta-algorithm to E-Cell cannot be overstated. Because the implementation of the kernel of the Simulation Environment is primarily an implementation of the meta-algorithm, the architecture of the meta-algorithm forms a substantial subset of the architecture of the E-Cell Simulation Environment and thus bears our initial attention.The meta-algorithm originated in the field of discrete event simulation. One important insight obtained by this field is that all time-driven simulation algorithms can be classified into one of three categories based on how they update variable values: differential equations (equations that modify variables by changing their velocities), discrete time equations (equations that update variables by instantaneously and directly modifying their values) and discrete event equations (equations that represent quantities changed as the result of another event within the model occurring). Using this classification, discrete event simulation provides another result, given by Ziegler,2 which states these three types of algorithms can be integrated in what is called a discrete-event world view (DEVS). In this formalism, a model state consists of a set of variables that is updated at discrete times along with a global event queue listing the state-changing events that are scheduled to occur and their times of occurrence.

Time advances in a discrete event system by taking the first event in the event queue, advancing global time to the moment of that event's occurrence and executing the event, which causes state changes to the model. The type of event, either discrete or continuous, results in either the model variable values being modified directly or in changes to variables within the model. Next, time advances from the occurrence of the first event to the next scheduled event by using the recorded variable velocities to integrate all model variables to the time of the next event. By alternately executing events and integrating state, a DEVS simulator can calculate the trajectory of the model through time by calculating a sequence of states, one for each time an event occurs. If the model state is needed between the occurrences of two events, the model state can always be integrated from the time of the previously occurring event to the time the state is needed. Thus the discrete-event world view describes one way to create a generic simulator.

The meta-algorithm, developed by Koichi Takahashi, is a concrete specification of a discrete-event world view system that has been implemented with efficiency in mind. The discussion of the meta-algorithm we present here will only be general, as a more detailed account is outside the scope of this text. See Takahashi, 2003, for the definitive treatment.

The meta-algorithm describes in detail how a model using multiple algorithms can be unified in a discrete-event world view framework. It is called a “meta”-algorithm because it is only a template for a simulator and only becomes a concrete algorithm when a particular model using a particular combination of algorithms is interpreted. The specification of the meta-algorithm begins by specifying the data structures used to represent a multi-algorithm model. The most fundamental data structure defined is an object called Model, which is defined as a set of Variable objects and a set of Stepper objects. A Variable is defined as a single named real value, such that the Model object has the property that its state at any given time is completely described by the state of its set of Variables.

A Stepper can be explained by describing it as a computational subunit of the Model object, representing some subset of the total set of interactions that occur within the Model. Each Stepper object consists of a set of Processes, which are objects encapsulating specific algorithms, an interruption method, a local Stepper time and a time step interval. Each event in the meta-algorithm framework consists of a single Stepper “stepping”, a term that describes the process by which a Stepper uses its Processes to update the Model, notify other Steppers in the Model of the changes made and prepare itself for stepping again by rescheduling itself as an event.

Within these computational subunits, a Process is defined as an object that uses some subset of the current Model state as well as a time interval to update Variables in the Model to a new state. Processes are organized by the way in which they use Model's current state to modify that state by noting that for any Process in a Model, two sets of variables can be identified. The first is that Process' set of accessor variables, which are the variables used by that Process to read the environment in order to calculate the future state. The second are the mutator variables, which are the variables actually updated by this Process (note that a particular Variable might appear within both sets). Using the theorem from Discrete Simulation presented above, the meta-algorithm characterizes any Process as either continuous or discrete; it further states that individual Steppers can only drive a set consisting of one type of Process and calls these types Continuous Steppers, Discrete Time Steppers or Discrete Event Steppers.

At this point, the Model specified by the meta-algorithm globally looks like a set of real values, as well as a set of computational subunits, each of which represents continuous or discrete sets of behaviors that causes change within the model. Two more pieces of data are required. The first is a global time value (which is always equal to the minimum of the set of local times of the Steppers). The second is a binary relation on the Steppers, called the Stepper Dependency. This relation is defined in the following way: A pair of non-equal Steppers (S1, S2) is in the Stepper Dependency if Stepper S1 contains a Process Pi and Stepper S2 contains a Process Pj such that the intersection of the mutator variables of Pi with the acccessor set of Pj is nonnull, which means that two Steppers are related if the first modifies a value that the second needs to read. This is the data the meta-algorithm works on and with that covered, we can now move on to explaining how the meta-algorithm advances time and drives simulation.

For any system represented as a Model, the meta-algorithm specifies how time can be advanced. Like any system built using a discrete-event world view, time is advanced within the meta-algorithm as a series of discrete events, where events consist of individual Steppers “firing”. Therefore, each iteration of the meta-algorithm consists of several parts: choosing the next Stepper to execute by comparing their local times, preparing the Model to run that event, “firing” that Stepper and then resolving the Model so that everything is ready to iterate again.

Each iteration begins with choosing the next Stepper to execute. Because each Stepper keeps a record of when it ought to step next, finding the next executing Stepper is quite simple: it is the Stepper with the smallest time of next firing.

The next step is to advance time in the Model to the time of the scheduled event. Because each event occurs at a discrete time, the interval between any two consecutive events is, in general, nonzero. And because each round of iteration can end leaving Variables with nonzero velocity, this means that the Model state at the time of the previously occurring event must be integrated to the present time, using some kind of extrapolation based on previously recorded variable velocity changes.

Next comes Stepper stepping, a process consisting of several parts: modifying global time, updating variable values within the model, preparing to run again in the future and notifying other Steppers of the changes made.

The first portion consists of the executing Stepper's step function being called. The step function may call one or more of the Processes owned by that Stepper, which, depending on whether the Stepper is of the continuous or the discrete variety, results in either discrete changes to the values of the Variables or changes to the Variable value velocities. This function also causes the time variables of the executing Stepper to be modified. First, the Stepper's local time is updated by adding the current step interval to the current local time; second, based on the Model state after Processes have been fired, a new time-step interval is chosen, preparing the Stepper for its next firing.

The second portion of the firing process consists of notifying all Steppers whose Processes access variables are modified by the firing Stepper to inform them of relevant changes to the model and update their future behavior accordingly, such as the next time they are scheduled to step. For this, the global Stepper Dependency is used. For an executing Stepper S, all pairs (S, D) are found and the interruption method for each such Stepper D is called. This allows Steppers that depend on the data modified by the executing event to examine those changes and update their next time of execution if needed. Once this is completed, Stepper firing is finished, leaving only the “in-between steps” of recording the model state and checking to see if simulation-end conditions have been met.

This is the meta-algorithm. It specifies how a generic simulator simulating any model can be efficiently implemented using a particular implementation of a discrete-event world view simulator. Conceptually, this meta-algorithm forms the foundation for the E-Cell Simulation Environment. In fact, the E-Cell Simulation Environment kernel is practically a direct implementation of the formalism specified above. Everything else is largely a product of software design, wrapping this generic simulator in clothing that makes it configurable and easy to use.

The E-Cell SE Kernel

Now that we have an overview of the theoretical foundations of the E-Cell Simulation Environment, we can move on to its implementation, called Libecs, which is the name of the simulator kernel of the E-Cell Simulation Environment. This kernel is written entirely in standard ISO C++, and not only implements the meta-algorithm, but also provides the fundamental API to all the essential features of the core system, such as data logging and model object creation.

With regards to calculation, Libecs does three things. It defines the data structures which represent the state of the model, the data structures which represent the forces on the model and the functions that advance time by manipulating these two sets of data.

Data definition in the Libecs implementation begins with the definitions of four basic object classes, which form the parent classes for the different types of model components. Three of these types, called Variable, Process and Stepper, conceptually correspond to the identically named objects in the meta-algorithm (Fig. 2). The fourth, called System is new and acts as a type of set that contains Variables as well as other Systems and serves the role of organizing all the state data within the model. To quickly discuss the roles of these objects, Variables represent the basic quantities of model state (the number of some specific chemical species, for example). Process objects represent individual mathematical relationships within the model (such as an equation that relates the values of two model Variables). Stepper objects define computational subunits of a model and control entire sets of Processes by activating the set as a group. That activation, which is called a “step” and is initiated by calling a Steppers's “step()” method, constitutes the atomic action in the discrete event system which Libecs implements. As mentioned above, Systems organize the Variables that make up the model state. In fact, the set of all Variables, as defined in the meta-algorithm, is represented in Libecs by a single System object, called the root system, that acts as the one and only container for state data by containing all other model Systems and Variables inside itself. By creating different combinations of these objects, Libecs can represent any model.

Figure 2. Overview of the Fundamental Classes of E-Cell.

Figure 2

Overview of the Fundamental Classes of E-Cell.

One more common feature of these types must be discussed. In this framework, which hopes to construct a representation of any generic model by creating combinations of arbitrarily defined Process, Variable and Stepper object types, an important requirement is a way of assigning to those different objects arbitrary properties of various types. For example, any Process that describes some type of reaction between different chemical species requires additional rate information beyond simply specifying reactants and products. Because we wish to be able to build arbitrary models using arbitrary components, Libecs defines a property API, so that arbitrary property names, paired with a mutable data value of a polymorphic type, can be added to an arbitrary object in the Model. Some properties come predefined for all objects. For instance, each Variable object has a “Value” property of either Real type or of Integer type, where it might represent population count. Each System has a “Size” property that is a real value that represents the volume of that compartment; every object has a “Name” property that takes a string. Furthermore, by using this interface, models simulated in the E-Cell Simulation Environment have the property that the collection of all object properties is equivalent to the model state (many classes defined by Libecs do have nonproperty, member data, but this all corresponds to data about the state of the simulator itself and not the state of the simulated model.)

In order to implement this kind of universal property interface within Libecs, each model object in the kernel derives from a class of type PropertiedClass, which acts as a generic interface to properties of model objects by containing a static map listing all the PropertySlots owned by objects in the Model. A PropertySlot (Fig. 3.) is the association of a PropertyName (a string) with a PropertyValue, which is a polymorphic object that can be of type Real, Integer, String or a List (which itself is a list of other polymorphic types, including other Lists). Because each specific property of each individual Model object is associated with a single PropertySlot object, the static map of all these PropertySlots, owned by PropertiedClass, is universal in E-Cell—each Model object can directly access any property of any other Model object as easily as any other.

Figure 3. Object properties in E-Cell SE.

Figure 3

Object properties in E-Cell SE.

The Libecs implementation of Properties, whose software architecture is shown in Figure 3, has several advantages. First, it allows the basic model objects defined in Libecs to represent any generic type of model object. Any arbitrary set of properties a specific model object might have can be stored as a set of PropertySlots within this interface, which is uniquely associated with that object. Second, this scheme allows for multiple types of different property values to be assigned (this is called polymorphic behavior) without sacrificing efficiency. Commonly, a client directly accesses a polymorphic property value, finding the desired PropertySlot using the PropertiedClass interface and then using the generic getProperty() and setProperty() methods of that PropertySlot to access the value. This type of access assumes nothing about the underlying data type and is on the order of performance of standard C++ polymorphism. However, when a particular property must be accessed repeatedly, as is the case where the logging components of the software have to repeatedly access the same PropertySlots in order to record their values through time, the PropertySlot interface can also be used to get a concrete interface, called a PropertySlotProxy, that knows the underlying type of the PropertySlot object and accesses it directly, bypassing polymorphic behavior. When a PropertySlotProxy is created and cached between multiple accesses of a particular PropertySlot, it can be used to increase the speed to access that Property. Thus, this organization of all state data into different PropertySlots means that all that data is accessible at any time, even without knowing the type of data in advance. This organization also has the added advantage that when the client does know the type of data, this knowledge can be used to increase performance. With these advantages, the Property Interface is a very convenient organizational tool for all the data within an E-Cell Model.

If we understand the role of a Process object is to represent a specific algorithm within a model, then we can understand the role of the Stepper object is to a act as a computational subunit of the model because they contain and manage sets of Model Processes and act as an interface to coordinatethe execution of subsets of Processes within its set. Because of the distinction made in the field of discrete event simulation, Libecs divides Processes into two types: Continuous and Discrete. As you might expect, Continuous Processes represent differential equations, which describe how to simulate continuously changing quantities; Discrete Processes represent equations that describe discrete changes to the model state at specific times. Using these definitions as a foundation, Libecs defines four varieties of Steppers, which correspond to the four allowed ways that Processes can be grouped together within Libecs. These four types are DifferentialStepper, DiscreteTimeStepper, DiscreteEventStepper and PassiveStepper (Fig. 4).

Figure 4. Stepper classes of E-Cell SE.

Figure 4

Stepper classes of E-Cell SE.

A DifferentialStepper maintains a set of continuous Processes and acts as a unit for solving those differential equations: the individual Processes are the individual equations in the system and the DifferentialStepper is a program that actually solves that set of equations. The specific job of a DifferentialStepper is to act as the differential equation solver for its Processes by determining optimal times for recalculating trajectories so that recalculation of equations is performed as infrequently as possible while maintaining accuracy. A computational challenge in simulating trajectories of systems of differential equations comes when the set of differential equations contained by a DifferentialStepper is “stiff ”. A system of equations is said to be stiff when explicit numerical methods for that system become very inaccurate unless step sizes are small, oftentimes unacceptably so. In this case, implicit methods, which use past information as well as current state, become much more efficient (although under nonstiff conditions implicit methods are less effective). The Libecs implementation of the DifferentialStepper type performs adaptive switching between implicit and explicit methods between nonstiff and stiff regions, using an explicit Dormand-Prince algorithm (corresponding to a fourth-order Runge-Kutta with adaptive stepsizing) and an implicit Radau IIA algorithm (the best implicit Runge-Kutta equation currently known) in order to overcome these problems.

For driving discrete modeling, E-Cell provides three Steppers: DiscreteTime, DiscreteEvent and Passive. DiscreteTimeSteppers are used for algorithms that represent changes to the system that occur at discrete moments but where the actual time of “stepping” depends on the state of the system (an algorithm that calculates population changes by executing one reaction event after another, such as the Gillespie algorithm, is an example of this type of algorithm). DiscreteEvent Steppers are used for DiscreteProcesses where the Processes fire at intervals independent of Model state. Finally, PassiveSteppers control Processes that never spontaneously fire, where events happen only as a result of specific cues within the environment. These are the four types of Steppers that exist in the Libecs environment and correspondingly, these are the four types of events that can occur in E-Cell SE.

Last but not least, there is one other major component integral to the operation of the Libecs kernel, the LoggerBroker, which acts as an interface to all features of the kernel involving data logging. Using the LoggerBroker, Libecs can record the values of any or all of the PropertySlots in the Model during the course of simulation. The LoggerBroker object works by creating and managing collections of Logger objects, each of which is associated with a specific PropertySlot in the model (a PropertySlot, as you recall, is the combination of a PropertyName and PropertyValue belonging to a object in the Model). Immediately following the execution of each event during simulation runtime, the LoggerBroker executes its log() method, which records each of the PropertyValues associated with each of its Logger objects. The LoggerBroker interface provides several advantages to the E-Cell Simulation Environment. First, it provides a unified logging API that enables client access to logging capabilities either at the low Libecs level, or at the higher architectural levels that users use (we will later see how the E-Cell SE architecture wraps low level Libecs capabilities to higher level functions that can be easily used by human users). Having a single object in charge of this API also allows the logging process to be optimized in two ways. First, logged data is more efficiently stored in memory than might otherwise be possible, by internally having the LoggerBroker periodically move stored data between memory and the hard disc in order to optimize the handling of large data sets. Logger objects are also optimized for speed of access to PropertyValues by obtaining and caching a PropertyCacheProxy for that class, which allows for faster amortized accesses to the PropertyValue than through normal, polymorphic accesses.

Now that we are familiar with the individual objects contained within the kernel and understand how the fundamental simulation types can be combined to represent mathematical models, we can proceed to the process of how models are instantiated and simulated through time. When the kernel initializes, it begins by creating an object called a Model. The Model object contains, organizes and coordinates all the data and software components needed to represent a model and provides an interface to all the functionality of the E-Cell kernel: creating and setting up models in preparation for the running of a simulation, stepping the model and logging data through iterations of the meta-algorithm. The Model has three objects which help it to implement these tasks: a root System object, a Scheduler object and a LoggerBroker object. The root System object contains all Variables in the model as well as other Systems. The Scheduler object contains all the Steppers within the model and organizes the execution of Stepper events, which when called in order using their individual step() methods advances the state of the root System object through time. The LoggerBroker object logs object PropertyValues after each cycle of the meta-algorithm. This structure is shown in Figure 5. Please note that the Steppers are contained within the Scheduler and all the Processes in the model are contained within the different Steppers.

Figure 5. An overview of the class structure of the E-Cell SE kernel.

Figure 5

An overview of the class structure of the E-Cell SE kernel.

Once the Model class has been initialized, a model is instantiated within it by calling different factory methods for creating Variable, Process and Stepper objects within the Model, one at a time. These factory methods create the new objects either in the root System, in the Scheduler (if it is a Stepper) or in a Stepper (if it is a Process). Once each object has been individually added by Libecs to the Model class, a Model member function called initialize() is called, which prepares all additional data structures needed by the Model class. Most notably, both the Stepper Dependency and global time are set up during this stage, according to the specifications of the meta-algorithm. Likewise, one Event is created for each Stepper (each Event represents the simulation event consisting of the next stepping of its associated Stepper). These events are stored and maintained in an event queue owned by the Scheduler object and sorted by time of next planned stepping. Once the setup of all data structures is completed, advancement of time is ready to begin.

Implementation of the meta-algorithm (Fig. 6) is spread throughout the kernel. The entry point to the meta-algorithm is in Model's step() function, which executes one iteration of the meta-algorithm. Generally, this process consists of determining the time of the next occurring event and setting global time accordingly, integrating the model state to the new current time, stepping the scheduled Stepper by calling its step() method, which causes some Processes to be fired, logging the changes and using the StepperDependency to notify any dependent Steppers so they can reschedule themselves as well as modify any internal parameters as needed.

Figure 6. The time-advancement process in the E-Cell SE kernel.

Figure 6

The time-advancement process in the E-Cell SE kernel.

As mentioned, the first activity performed during a simulation cycle is a determination of the time of the next scheduled event. The Scheduler's event queue begins each step() cycle containing a list of executing events in order. Therefore, finding the time consists of inspecting the value of the scheduled execution time contained by the Event at the very top of the Event Queue.

The next step is to advance time by integrating all reference variables (the combined list of the variables each Process within that Stepper must read) associated with the about-to-be-executing Stepper. This is done by calling the integrate() method for each Variable in the executing Stepper's variable reference list, which uses recorded interpolants of that variable to extrapolate the future value at any specified time. The way this integration procedure works is related to the way in which velocity changes are recorded by Variables. Each Variable that is to be continuously modified is by definition a mutator reference for some Continuous Stepper registered with the kernel; each such Continuous Stepper contains an Interpolant class and during initialization each such Stepper registers an instance of the Interpolant class with each of its mutator reference Variables. When a Continuous Process needs to add a velocity change to a Variable, it does so by passing the changes through the Variable's Interpolant class, which translates velocity changes into interpolant values. When a Variable is called upon to integrate itself to a current time, it uses these interpolant values to calculate interpolant differences, summing over these differences to approximate the value of the Variable at the specified time. Thus, using its interpolant coefficients that are guaranteed to be up-to-date at each point of simulation, a Variable can give its value for any moment in time.

Once integration is completed, the scheduler executes the action of the Stepper, by calling its step() method, which fires some subset of the Processes associated with this Stepper. This step is general (it is implemented as a virtual method) and its exact behavior depends heavily on the type of Stepper. For example, in a DifferentialStepper, which contains Processes corresponding to differential equations, this method consists of calculating and updating velocities of variables through Interpolant classes, along with calculating approximate time steps for the next execution of the Stepper. However, in a DiscreteTimeStepper, this method consists simply of discretely updating the values of variables by firing the Processes within the Stepper.

Next comes logging, as the Stepper being stepped executes its log() method, which indicates to each logger associated with a PropertySlot of some Variable that Stepper references that it should record the value in that PropertySlot at the given time. This procedure is shown in Figure 7. The result is that each logger accesses its associated PropertySlot value through its PropertySlotProxy and inserts it into its PhysicalLogger object for recording.

Figure 7. The logging process in the E-Cell SE kernel.

Figure 7

The logging process in the E-Cell SE kernel.

Finally, using the time of next stepping calculated during its step() method, the Stepper is rescheduled in the Scheduler's event queue.

At this point, all state changes have been propagated into the model and time has been updated. The final activities of the meta-algorithm consist of resolving the model to incorporate the state changes during the current iteration by interrupting and rescheduling each Stepper that is dependent to the current one. This interruption process may change any implementation variables owned by the Stepper, most notably the next time of stepping. PassiveSteppers also fire their Processes here because they only execute after being interrupted by some occurrence. After being interrupted, each Stepper that updates its next firing time reschedules itself so that the Event Queue remains ordered by time as a post condition of Model's step method. At this point, an iteration of the meta-algorithm has been completed and the kernel is ready to advance time once again, choose the next Stepper to execute and so forth.

The major advantage to this architecture is due to the generic interfaces belonging to its fundamental classes. Specifically, because a Process class is only required to define initialize() and fire() methods, new Processes can be programmed and dynamically loaded by the Libecs kernel during runtime. As long as a modeling formalism can be encoded into an algorithm, it can be compiled as an E-Cell plug-in module and loaded into the E-Cell Simulation Environment.

Interfaces to the Kernel

From a scientific programming perspective, Libecs is a complete implementation of a generic simulation platform. From the perspective of software engineering, it is not enough. Although the framework is extensible through the common algorithm interface provided by Libecs, it is cumbersome to invoke the core system library directly. Therefore, the kernel is wrapped in a Python interface layer to aid in programming, scripting and providing front ends to the kernel.

The Python interface layer API provides a thin interface to the kernel and is structured around a Session object. The Session object provides an interface for setting up models, running simulations and scripting sets of simulation runs. Methods provided by the Session API can be divided into five types. Entity and Stepper methods allow for individually creating or accessing these objects within a model. Logger methods allow for adding Loggers to a model, as well as saving the data recorded by those Loggers. Simulator methods allow for advancing time within a model, either by some fixed amount of time or by some number of steps. Finally, Session methods provide high-level functions for running the E-Cell Simulation Environment, most notably, methods for loading and saving models from E-Cell Model Language files (EML files). This Python API is covered in detail in the E-Cell Manual.3

For E-Cell development, it is important to note that the Python interface layer does not directly wrap the kernel. Instead, a micro-core layer, called libemc, is built in C++ on top of the kernel, which contains many of the functions found in the Python interface. This layer is then wrapped in a Python interface and combined with other Python code known as PyEcell to produce the complete Python interface API upon which the front-ends to E-Cell are built. This layered architecture is presented in Figure 8.

Figure 8. The architecture of the E-Cell Simulation Environment.

Figure 8

The architecture of the E-Cell Simulation Environment.

Built on top of the Python API are three front ends provided by E-Cell: ecell3-session-monitor, ecell3-session and ecell3-session-manager.3 Ecell3-session-monitor is a graphical user interface that is well suited for interactive model editing and running individual simulations. This is especially useful to researchers initially investigating the behavior of models as it provides numerous capabilities for investigating and analyzing the behavior of the model at any level. The behavior of individual components can be investigated individually or as a whole using a graphical interface. The second major front end component is the ecell3-session command, which provides a command line interface suitable for scripting and automating the processing of large models. This command line mode is an extension of a Python shell that directly reflects the Python session API. The final front-end is the ecell3-session-manager, which is designed for running multiple parallel sessions in either a grid or cluster environment. Ecell3-session-manager provides three classes, SessionManager, SessionProxy and SystemProxy.4 Used in tandem, these provide a way for the E-Cell SE to be associated with a computing environment, whether that be a single computer, a grid or a cluster and then to automate the running of large numbers of similar models by creating jobs and farming them out to the execution environments registered with the system.While these tools have been designed to match common tasks users are faced with as they attempt to elucidate biological understanding from models, they are also an illustration of the extensible environment that the E-Cell Simulation Environment provides. E-Cell SE has been designed so that as much as possible users can modify or extend it according to their own needs. By wrapping the core simulation code in a programming API of an easily used programming language like Python, this goal is realized, providing nearly unlimited forms in which this software can be used.

Future Directions

While our tour of the E-Cell SE architecture is nearly complete, it is informative to look at what the future holds in developments for E-Cell. Two major developments currently being prepared for the next major release of E-Cell are spatial modeling and the introduction of a dynamic model structure. Currently, there is no direct support for representing spatial location within E-Cell. One goal of the project is to encourage the perspective that models can be most effectively made by using the tools of reductionism and using the most appropriate algorithm for any sub-system. Because biologists can present examples of systems, such as diffusion, active transport, molecular crowding and cytoskeletal movement, where spatial views are important for understanding, it is clear that prepackaged, “out-of-the-box” spatial algorithms and representations are necessary in a biological modeling and simulation platform such as the E-Cell Simulation Environment. This will be added to E-Cell in the form of multiple spatial representations that objects can exist in and interact, such as either continuous three-dimensional space or lattices of discretised regions in space.

The second major development will be support for a dynamic model structure, including the creation and deletion of objects. Fundamentally this is important because biological systems are dynamic themselves. Within these systems, objects are created and destroyed constantly and in order to appropriately model these systems, dynamic abilities must be added. One specific example of how such a feature might be useful can be found in the study of multi-protein complexes. In the study of many intra-cellular processes, such as signal transduction, a feature known as combinatorial explosion is often present. This situation is caused when a relatively few numbers of proteins can combine in regular ways, producing situations where the number of complexes that can potentially be created is enormous, far greater than that which can ever be sensibly enumerated. Because of this, dynamic model structure must be provided so that these species no longer need to be a priori enumerated and can simply be dynamically created and added to a dynamic model during runtime, along with procedures for informing the rest of the model of the new changes.

An E-Cell System incorporating these features is currently under development. These processes will be critical to the biological modeling and simulation that must take place in order to make the most accurate models of complex systems possible.

Conclusion

Development of the E-Cell Simulation Environment has been motivated by the belief that large-scale complex models can best be created and understood by composing models written with arbitrary algorithms. E-Cell supports this with a meta-algorithm incorporating a unique plug-in architecture that allows new algorithms to be written and seamlessly integrated into the E-Cell Simulation Environment. Secondary considerations for the design of E-Cell include a belief that this software should be extensible and customizable for users. While the E-Cell Simulation Environment provides in its default distribution several programs users should find quite useful, it will always be possible to write new interfaces that allow for simulation using the E-Cell SE. In this way, E-Cell SE can be a simulator that is relevant far into the future. As new algorithms are developed, they can easily be incorporated into E-Cell SE. As new workflows become needed, E-Cell SE can be molded to fit the required niche. In this way, we expect this generic platform to prove increasingly relevant, providing all the power and flexibility needed to users even as their ambitions only grow.

References

1.
Takahashi K. “Multi-Algorithm and Multi-Timescale Cell Biology Simulation”, PhD thesis, Keio University. 2004 .
2.
Ziegler BP, Kim TG, Praenhofer H. “Theory of Modeling and Simulation: Integrating Discrete Event and Continuous Complex Dynamic Systems” (2nd edition) San Diego, London: Academic.
3.
Takahashi K, Addy N. “E-Cell Simulation Environment Version 3.1.105 User's Manual”, <http://www. e-cell.org/software/documentation/ecell3-users-manual.pdf>, 2006, Accessed. 2006 .
4.
Sugimoto M, Takahashi K, Kitayama T. et al. Distributed Cell Biology Simulations with E-Cell System. In: Konagaya A, Satou K., eds. Lecture Notes in Computer Science, Berlin: Springer-Verlag. 2005:20–31.
Copyright © 2000-2013, Landes Bioscience.
Bookshelf ID: NBK6248

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...