- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC2868105

# Integrated stoichiometric, thermodynamic and kinetic modelling of steady state metabolism

^{*}Corresponding author. Present address: Science Institute & Center for Systems Biology, University of Iceland, Reykjavik, Iceland. Tel: +354 618 6245

## Abstract

The quantitative analysis of biochemical reactions and metabolites is at frontier of biological sciences. The recent availability of high-throughput technology data sets in biology has paved the way for new modelling approaches at various levels of complexity including the metabolome of a cell or an organism. Understanding the metabolism of a single cell and multi-cell organism will provide the knowledge for the rational design of growth conditions to produce commercially valuable reagents in biotechnology. Here, we demonstrate how equations representing steady state mass conservation, energy conservation, the second law of thermodynamics, and reversible enzyme kinetics can be formulated as a single system of linear equalities and inequalities, in addition to linear equalities on exponential variables. Even though the feasible set is non-convex, the reformulation is exact and amenable to large-scale numerical analysis, a prerequisite for computationally feasible genome scale modelling. Integrating flux, concentration and kinetic variables in a unified constraint-based formulation is aimed at increasing the quantitative predictive capacity of flux balance analysis. Incorporation of experimental and theoretical bounds on thermodynamic and kinetic variables ensures that the predicted steady state fluxes are both thermodynamically and biochemically feasible. The resulting *in silico* predictions are tested against fluxomic data for central metabolism in *E. coli* and compare favourably with *in silico* prediction by flux balance analysis.

**Keywords:**systems biology, constraint-based modelling, linear polytope, logarithmic polytope, algebraic geometry

## 1 Introduction

It has long been appreciated that biological systems are subject to physico-chemical laws. These laws impose certain constraints on the feasible set of enzymatic reaction rates and metabolite concentrations. Within the physico-chemically feasible set, the actual values of evolved kinetic parameters will determine the functional state of the cell. In the absence of experimentally determined kinetic parameters, it is still possible to mathematically formulate and computationally implement systems of equations, the solutions to which represent a physico-chemically feasible set of functional states. Ideally, one seeks to incorporate all relevant physico-chemical constraints in a single framework. This brings a model closer to our conception of reality, reduces the size of the feasible set, and is aimed at increasing the predictive capacity of such models. However, it is not always computationally tractable to apply all known physical laws since the resulting equations are difficult to solve. This is especially so for genome scale models where there are typically a large number of equations and variables.

Flux balance analysis is a standard constraint-based modelling approach, based the assumption that mass is conserved in a biochemical system of reactions (40). In addition, within the timescale of interest, often one can reasonably assume that the concentrations of metabolites are time invariant, that is, the system is assumed to be at a steady state. These two constraints define a convex polytope of feasible reaction fluxes, even though absolute metabolite concentrations are not explicitly represented. Herein we derive a formalism where reaction flux to obeys mass conservation, energy conservation, the second law of thermodynamics, and mass action kinetics. The resulting constraint equations define a non-convex feasible set, with metabolite concentration and kinetic parameters are explicitly represented. We also demonstrate the application of this modelling approach with an example using a model of central metabolism in *E. coli*.

In addition, theoretically we illustrate how experimental knowledge of physiologically feasible ranges and theoretical biochemical predictions of feasible ranges of other pertinent variables may be incorporated into this new modelling framework (21). Natural selection has to operate within the constraints enforced by physical laws. By integration of genome scale stoichiometric, thermodynamic and kinetic constraints, in a framework amenable to numerical analysis, we aim to investigate *in silico*, the mechanisms responsible for the results of adaptive evolution experiments. This is the ultimate goal, but there are significant mathematical and computational modelling challenges which need to be addressed before these biologically more pertinent issues can be addressed.

Stoichiometric models of metabolism continue to grow in size due to new models of higher organisms (51) and incremental extension of existing models (20, 53). As the dimensions of the stoichiometric matrix increases, computational complexity theory dictates that some computational analysis techniques no longer become practical due to memory or time constraints (55). This problem is exemplified by vertex enumeration of convex polytope where no known polynomial time algorithm exists (3). Vertex enumeration is the key step in elementary mode, or extreme pathway analysis (40). Unless significant advances are made in reducing the complexity of vertex enumeration, one cannot hope to utilise this analysis method for the latest genome scale models (20). An additional problem is that existing methods of low complexity may be numerically unstable for large matrices with coefficients spread over many orders of magnitude. The application of less conventional linear algebra to design a more numerically stable, yet low complexity algorithm can help to alleviate such problems (47).

In tandem with growth of stoichiometric matrices, there are efforts to increase their predictive capacity by incorporating additional constraints, beyond mass conservation with a steady state assumption. The addition of Boolean genetic regulatory constraints, to flux balance models of *E. coli* metabolism, increases their qualitative predictive capacity when compared to experimental data of mutant growth capability on various media (14). Predictions of qualitative growth capability or quantitative growth rate (28) are interpretations of the numerical values of exchange fluxes from flux balance analysis of stoichiometric models. It is not possible to reliably predict internal fluxes in genome scale metabolic models without additional thermodynamic constraints. The simple fact is that stoichiometrically balanced loops are invariably present in such networks and, with only stoichiometric constraints, they exhibit cyclic net fluxes entirely independent of exchange fluxes. Energy conservation, and the second law of thermodynamics, can be combined with stoichiometric constraints to retain only mass balanced and thermodynamically feasible fluxes (6). However, although such fluxes obey mass conservation, energy conservation and the second law of thermodynamics, additional constraints are required to make them biochemically feasible.

Standard chemical potential forms a bridge between metabolite concentration and metabolic flux in thermodynamically constrained stoichiometric models. Experimentally, one can measure standard chemical potential and combine this data with stoichiometric models. This approach has shown promise in constraining viable fluxes and assisting interpretation of metabolome data (32). Theoretically, one can estimate standard chemical potential using group contribution methods (35, 24, 31). The group contribution method uses knowledge of the chemical energy within recurring chemical bonds and enumerates the prevalence of these bonds, within a compound of known structure, to estimate the entire standard chemical potential. Using this method, it is possible to estimate the standard chemical potential of almost all compounds in *E. coli* (20). These estimated results compare favourably with experimental measures, provided that the associated uncertainty in estimation is acknowledged (23, 31). Experimental and theoretical approaches are desirable and complimentary, since it is laborious and difficult to measure all the values experimentally whereas estimates need experimental confirmation.

Using standard chemical potential and metabolite concentration, one can place upper and lower bounds on the *in vivo* change in chemical potential of almost all reactions in genome scale stoichiometric models of metabolism (20). Given a stoichiometric model where all reactions were initially reversible, quantitative bounds on change in chemical potential, in combination with mass conservation, energy conservation and second law of thermodynamics, constrain the feasible steady state fluxes to thermodynamically feasible and physiologically relevant solutions. Without such bounds on chemical potential, the set of thermodynamically feasible solutions could include degradation of biomass into external nutrient metabolites, the opposite of the more probable biomass synthesis direction. Maximising bacterial growth, constrained by mass balance, energy conservation and the second law of thermodynamics, integrated with standard chemical potential and metabolite concentration estimates, has recently been shown to significantly improve the correspondence between computed and experimental internal flux rates in *E. coli* (26).

With the size of stoichiometric models increasing and simultaneously the scope of constraints expanding a pressing problem is to find a mathematical formalism which encompasses all current constraints in a concise form yet lends itself to robust, scalable numerical analysis. In addition, it is desirable to use a form which is invariant to the addition of new, possibly nonlinear, constraints. In particular, efforts are under way to collate the kinetic equations and parameters of enzymes in model organisms (42), including *E. coli*. Building on earlier work (41), herein we present a mathematical formulation which includes steady state mass conservation, energy conservation, the second law of thermodynamics, and extends to encompass a system of reversible kinetic equations where bounds on standard chemical potential, metabolite or enzyme concentration and kinetic variables can be incorporated. We refer to kinetic variables rather than kinetic parameters to emphasise that this is a constraint-based approach to steady state kinetics where the variables are not exactly specified, but constrained to a bounded interval. The resulting system of simultaneous linear equations in linear variables, combined with linear equations in logarithmic variables, shows a strong mathematical similarity with the system of equations studied in chemical reaction equilibrium analysis (50, 46). The outline of the mathematical description below mimics the order of the aforementioned constraints. During the description we highlight salient mathematical features and use simple toy metabolic examples for illustration of key concepts. Initial efforts to apply this modelling approach to a genome scale model of *E. coli* metabolism is also illustrated.

## 2 Mass conservation at steady state

The set of chemical reactions in a metabolic network can be represented as a set of chemical equations. Embedded in these chemical equations is information about reaction stoichiometry. Stoichiometry represents the number of molecules of reactant consumed, and product produced, in a single chemical reaction. Since elements, Carbon, Nitrogen, Oxygen etc, are neither created nor destroyed in a biochemical reaction, the total number of each element, or moiety, is conserved even though their relative arrangements may change as old chemical bonds are broken and new ones are formed. All of the stoichiometric information about a metabolic network can be represented within a matrix, the stoichiometric matrix, **S** ^{m,n}, *i* 1…*m, j* 1…*n*, where *m* is the number of metabolites and *n* is the number of reactions (e.g. Supplementary Material A). Each elementary reversible chemical transformation is represented by a pair of conjugate columns of the stoichiometric matrix, one forward elementary reaction and one reverse elementary reaction, odd columns, *j*, and even columns, *j* + 1, respectively. Each row represents the participation of a single metabolite in all possible reactions. An integer coefficient in a column of a stoichiometric matrix gives the number of molecules of a single metabolite consumed or produced in that reaction. The stoichiometric matrix linearly transforms a flux vector, $\mathbf{v}\in {\mathbb{R}}_{>0}^{n}$, into a vector of metabolite concentration time derivatives, $\frac{d\mathbf{m}}{\mathit{\text{dt}}}$ giving the fundamental equation of dynamic mass conservation (27), which, with the assumption of steady state concentration, gives

The steady state assumption asserts that the concentration of metabolites is time invariant. Since Eq. 1 is generally under-determined, *rank*(**S**) < *m* < *n*, there are many flux vectors which satisfy the steady state mass conservation constraint.

## 3 Energy conservation and the second law of thermodynamics

Theoretical application of thermodynamic principles to chemical networks in terms of differential geometry, algebraic topology and circuit theory were considered by Oster and Perelson in the 1970’s (38, 39). This work remains to fully impact systems biology, most likely because of a dearth of advanced mathematical training amongst the systems biology community. When applied to biochemical networks, energy conservation, and the second law of thermodynamics, prevent net flux around stoichiometrically balanced cycles within metabolic networks, as emphasised by Beard, Liang & Qian (6). A finer point to note here is the implicit assumption that thermodynamically unfavourable biochemical transformations cannot be driven by a gradient of temperature, this is may not be the case for endothermic chemotrophes (33). Given the entire set of elementary reactions in a metabolic network, a stoichiometrically balanced cycle is a subset of contiguous reactions that form a subnetwork of chemical transformations that are perfectly mass balanced, without a member of the subset being an exchange reaction. An example of such a stoichiometrically balanced cycle is given in Figure 1. By the second law of thermodynamics, net reaction flux from reactants to products requires a concomitant thermodynamic driving force, provided by a drop in chemical potential from reactants to products. In Figure 1, if *A* has a greater chemical potential than *B*, and *B* greater than *C*, then by the principle of conservation of energy, *C* must have lower chemical potential than *A*. By the second law of thermodynamics, the *net flux* is in the direction of decreasing chemical potential, therefore there must be net flux in the direction *A* → *C*. Therefore, there is no *net flux* around the cycle *A* → *B* → *C* → *A*. Many more complicated stoichiometrically balanced cycles exist in genome scale models, to the extent that they become unidentifiable by sight alone and innumerable using vertex enumeration algorithms (3) in the latest genome scale models for *E. coli* (20).

Quantitative thermodynamic constraints prevent net flux around stoichiometrically balanced loops but it is possible to have no net flux around a loop yet still violate thermodynamic constraints. Consider Figure 1, if *A* is in equilibrium with *B* and *B* is in equilibrium with *C* then by the second law of thermodynamics, there is no net flux *A* → *B* → *C* and hence no net flux around the cycle. In flux balance analysis models, net flux between *A* → *C* is still possible even if there is no net flux *A* → *B* → *C*. However, net flux *A* → *C* violates energy conservation since *A* must be in equilibrium with *C*, when *A* is in equilibrium with *B* and *B* is in equilibrium with *C*. Conservation principles and model consistency from different perspectives are two sides of the same coin.

Let us consider the first *ñ* columns of the stoichiometric matrix corresponding exclusively to internal reactions, denoted ^{m,ñ}. If one computes a rational basis, **K** ^{ñ,k}, for the null space of , then **K** spans the vector space of stoichiometrically balanced loops at steady state and, by the definition of a null space, · **K** **0**. Multiplying across by a vector of metabolite chemical potentials, μ ^{1,m}, gives μ · · **K** = **0**. Since μ · = Δ, the vector of change in chemical potential for internal reactions, by substitution then transposing, one arrives at

This is an invariant, first derived by Beard and Qian (6), on total change in chemical potential at steady state for each stoichiometrically balanced loop and linear combinations thereof. Biochemically it enforces conservation of chemical potential energy by internal reactions of a stoichiometric network. By the second law of thermodynamics, net flux is with the direction of drop in chemical potential so this constraint is required in addition to energy conservation, in order to eliminate flux around these loops.

The net steady state change in chemical potential with respect to the forward reaction, Δ_{j}, is given by the logarithm of the ratio of reverse divided by forward flux, scaled by temperature, *T*, and the gas constant, *R*,

Here we assign the odd index *j* to the forward reaction, and even index *j* + 1 to the reverse reaction. The change in chemical potential with respect to the forward direction is equal to the change in chemical potential with respect to the reverse reaction if one changes its sign. Eq. 3 ensures that net flux is in the direction of a drop in chemical potential energy, in accordance with the second law of thermodynamics. Assuming the existence of a chemical potential for each metabolite, mass action kinetics, constant temperature and pressure, within a finite volume element, this thermodynamic relationship applies to elementary reversible reactions arbitrarily far from equilibrium (43). Obviously, at equilibrium, the rate of the forward and reverse reactions is equal so there is no change in chemical potential. The connection between Eq. 3 and the work of Hill on catalytic cycle fluxes (25), Ussing, Hodgkin and Huxley on ion transport, and Crooks on entropy production in microscopically reversible systems (15) has recently been discussed (7). It is fair to say that the exact validity of this relation, beyond the assumption of an elementary reaction, is still an open issue.

By considering all chemical transformations to be reversible, rather than approximating certain steps with ’irreversible’ reactions, with Eq. 3, we see that assigning zero flux to a reverse reaction dictates that the change in chemical potential, for the ’irreversible’ forward reaction, should be negative infinity. Such a situation is not physically possible as all changes in chemical potential are finite. This is a difference between the kinetic studies which routinely assume irreversibility and thermodynamic studies which can not. Given Eq. 2, by substituting an appropriately signed vector of equations 3 in place of Δ, after some algebraic manipulation we derived the novel equation

where **P** ^{k,ñ} and ln() denotes the component-wise natural logarithm of each flux in the vector of fluxes. (See Supplementary Material B for details an example). Eq. 4 simultaneously applies energy conservation and the second law of thermodynamics as constraints on logarithmic elementary flux. One advantage of this novel reformulation is that it eliminates thermodynamically infeasible fluxes without dependence on *a priori* knowledge of change in chemical potential, cf. (6). This is important since standard chemical potential is unknown experimentally for certain metabolites and accurate theoretical estimates are difficult because the metabolites contain substructures for which no group contribution estimate is known (31). Another advantage is that it explicitly considers elementary forward and reverse flux, rather than net flux, which is important in isotopomer analysis (49).

## 4 Kinetic constraints

The vast majority of kinetic parameters available in the literature are inappropriate for models of *in vivo* metabolism because they are byproducts of enzyme mechanistic studies carried out at temperature and pH far from *in vivo* conditions (13). However, mechanistic enzyme classification also indicates a specific mathematical form of kinetic equation for an enzyme in accordance with its mechanism. Different enzyme classes, with distinct kinetic equations, specify different potential relationships between their variables independent of actual numerical values for kinetic variables. For example, consider two bisubstrate reactions, one with a sequential ordered mechanism versus one with a ping pong mechanism (11). Since their rate equations are different, the relationship between the possible values of their variables are different. Just as the stoichiometric matrix constrains flux distributions, kinetic equations constrain distributions of kinetic variables, metabolite and enzyme concentrations at steady state. Since flux is a function of kinetic variables, metabolite and enzyme concentration, then kinetic equations implicitly constrain flux distributions. The kinetic equations for many *E. coli* enzymes are available in the literature and, as prerequisite for genome scale modelling, are in the process of being collated in a manually curated database (42). Even in cases where the rate equation is unknown, one can approximate the kinetics with a simple mass action rate law.

### 4.1 Phenomenological versus elementary kinetic variables

Reversible kinetic equations typically appear in one of two mathematical forms depending on whether phenomenological or elementary kinetic variables are used. Phenomenological variables are microscopic properties of enzymes (18) but are more easily measured in macroscopic experiments with ensembles of enzymes (12), whereas elementary kinetic variables correspond to discrete approximations of major conformational transitions and are much more difficult to measure experimentally. Consider one of the isomerase reactions from Figure 1 modelled as a three step kinetic mechanism

with associated elementary reaction kinetic variables, *k*. Reactions are modelled stoichiometrically as a one step mechanism as it is the net stoichiometry of the entire reaction which acts as a constraint. The phenomenological, reversible, Michaelis-Menten kinetic equation for mechanism 5 is

where υ* _{net}* is the net flux through the reaction,

*a*and

*b*are substrate and product concentrations, and

*e*is the total enzyme concentration. Specificity constants,

_{AB}*k*and

_{A}*k*, reflect the specificity of the enzyme for

_{B}*A*or

*B*in the presence of competing substrates. Michaelis constants,

*K*and

_{mA}*K*, are apparent dissociation constants. Phenomenological kinetic variables are functions of overlapping sets of elementary kinetic variables so they are not independent. In the case of reaction 5

_{mB}These functions can be viewed as implicit polynomial equations relating experimentally measurable quantities to quantitatively modelled kinetic variables, see Supplementary Material B. This facilitates the incorporation of appropriate experimental kinetic data (42) or theoretical predictions on optimal enzyme characteristics (21).

### 4.2 Thermodynamic Haldane equation

For a reversible reactions, with ordered binding and dissociation of substrates and products, the thermodynamic Haldane equation (11) relates standard change in chemical potential, $\mathrm{\Delta}{\mu}_{j}^{o}$, to a ratio of forward and reverse elementary rate constants

For reversible reactions where the kinetic mechanism is not ordered, such as the ’Ping Pong Bi Bi’ mechanism (11), or random, such as the ’Rapid Equilibrium Random Uni Bi Bi’ mechanism, the thermodynamic Haldane is no longer simply the ratio of two monomials formed from the forward and reverse rate constants (11). Instead, the thermodynamic Haldane is a more complicated polynomial. In such a situation one may use a transformation from polynomial to linear, linear-logarithmic equations in order to incorporate the thermodynamic Haldane equation into the system of constraint equations. The transformation from polynomial to linear, linear-logarithmic equations is discussed in section 4.3.

For ordered kinetic mechanisms, given the logarithmic identity

the relation between standard change in chemical potential for an elementary reaction, and a single row of the stoichiometric matrix, ${\mu}^{o}\xb7{\mathbf{S}}_{j}=\mathrm{\Delta}{\mu}_{j}^{o}$, one can rewrite Eq. 8 in linear homogeneous form in terms of logarithmic elementary kinetic parameters

Thus, standard chemical potential, 0 < μ^{o} ^{1,m}, can be used to linearly constrain the relationship between logarithmic elementary kinetic constants. Generalised to a system of reactions, in matrix form, we have

where **I** is the identity matrix, ${\mathbf{u}}_{i}^{o}\equiv \text{exp}(\frac{{\mu}_{i}^{o}}{\mathit{\text{RT}}}),0<{\mathbf{u}}^{o}\in {\mathbb{R}}^{m}$, and **k**_{+} ^{ñ} and **k**_{−} ^{ñ} are vectors of elementary kinetic constants, ordered appropriately according to each internal reaction. An identity matrix is a square matrix with all zero coefficients, except for the diagonal which consists of unitary coefficients. Eq. 11 constrains the kinetic parameters of the model to be consistent with thermodynamics, a constraint often overlooked in kinetic models (17) and likewise in numerical sampling of candidate values for kinetic parameters (19).

### 4.3 Kinetic rate equations

For reversible reactions, elementary kinetic variable rate equations express net flux as a function of elementary kinetic variables, metabolite concentrations and enzyme concentration. By splitting a reversible rate equation, into separate unidirectional forward and reverse flux equations, we have two polynomial equations relating elementary flux, metabolite and enzyme concentrations. For an entire metabolic network, all elementary unidirectional rate equations, and the relations between phenomenological and elementary kinetic variables, form a system of implicit homogeneous polynomials. We demonstrate how any system of polynomial equations can be algebraically transformed into a conjugate set of linear equalities, logarithmic equalities and logarithmic inequalities. This conjugate linear-logarithmic system has the same mathematical form as the aforementioned linear-logarithmic flux constraints representing mass balance and thermodynamic feasibility.

Consider a *d* dimensional system of *m* homogeneous polynomials, with *n* distinct monomials. Here *m* is the number of metabolites + enzymes in a system, whilst *d* is the number of unidirectional reactions + kinetic constants + metabolites + enzymes. A distinct monomial is a single term in a polynomial ignoring coefficients, e.g. 6*x*^{2}*y*^{3} and −4*x*^{2}*y*^{3} are indistinct monomials. The number of distinct monomials depends on the type of polynomial kinetic expression used to model each reaction (see Supplementary Material D for workthrough of toy example). Any system of polynomials, *i* 1…*m*, can be represented, without algebraic manipulation, in the form

where **A**_{i,j} is the coefficient of monomial *j* in polynomial *i*, **y**_{r} is a variable and **Q**_{j,r} is a an exponent of variable **y**_{r} within distinct monomial *j*. The coefficients are integers in the matrix **A** ^{m,n}. 0 < **y** ^{d} since it is non-physical to consider variables or concentrations of zero or less. Each row of **Q** ^{n,d} corresponds to one distinct monomial and each column of **Q** corresponds to one variable.

Now we arrive at the first key algebraic step. Let **y**_{r} exp (**z**_{r}), then, by the exponential identities, (*e ^{a}*)

^{b}=

*e*and

^{ab}*e*=

^{a}e^{b}*e*, we have

^{a+b} If we consider exp (**Q**_{j} ·**z**), *j* 1…*n*, as one component of an *n* dimensional vector then Eq. 12 condenses to become

where *z* ^{d}. This prepares the way for the second key step.

In the field of algebraic geometry, in particular, the study of algebraic varieties, which are geometric manifestations of solution sets of systems of polynomial equations, the matrix **Q** composed of exponents of distinct monomials, leads to the study of mathematical structures known as Newton polytopes (22). A polytope is a generalisation of a polygon to arbitrary dimension. To construct a Newton polytope, one considers the exponents of a single distinct monomial, a row in **Q**, as a vertex of this polytope. A vertex is a generalisation of a corner, to arbitrary dimension. The set of rows *j* 1…*n* of **Q** form such a set of vertices that define a polytope. It is a fundamental result from convex geometry that there are two equivalent ways to represent a polytope, in terms of a set of vertices, a vertex representation, or terms of a set of linear equations and inequalities, a half-space representation. Facet enumeration, using a convex hull algorithm, can transform a vertex representation into a half space representation. The Newton polytope of a polynomial system is the convex hull of **Q** when one considers each row as a vertex. We invite the interest of the algebraic geometry community at this stage as we depart from the usual perspective and instead consider each column of **Q** as the vertex of a polytope and use a vertex enumeration algorithm to compute the ’dual Newton polytope’.

Since the polynomials we consider are homogeneous, no entire row of **Q** is devoid of nonzero coefficients. **Q**_{j,r} = 0 when a monomial indexed *j* has a variable indexed *r* with a zero exponent since ${\mathbf{y}}_{r}^{0}=1$. Since **Q**_{j,r} we consider each column of **Q**, which is formed from the exponents of a single variable **y**_{r} across all *n* distinct monomials of the polynomial system, as a vertex of a polytope in the positive quadrant. Using facet enumeration we can compute the convex hull of **Q** and derive an equivalent half-space representation of the dual Newton polytope as a system of affinely independent inhomogeneous linear equalities and inequalities with new matrices **B** ^{b,n}, **C** ^{c,n}, and vectors **b** ^{b}, **c** ^{c}, which by the definition of **Q** as a set of vertices of this ’dual Newton polytope’ satisfy the equations

i.e. the new matrices **B** and **C**, with respective vectors **b** and **c**, are defined to be the halfspace representation of the same polytope with vertex representation defined by the set of vertices in **Q**. These are equivalent representations of the same convex polytope. Now, let **Q** · **z** ln (**x**), then Eq. 13 becomes

and equations 14 and 15 become

Intuitively, if one considers Eq. 16 as a polytope in linear scale, and Eqs. 17 and 18 as a polytope in logarithmic scale, then the simultaneous solution to equations 16, 17 and 18 is the intersection of a polytope in linear scale with the exponential of a polytope on a logarithmic scale (see Supplementary Figure 3).

Since we let **y**_{r} exp (**z**_{r}) and **Q** · **z** ln (**x**), it is consistent to relate the new vector **x** to the original vector **y** with **Q** · ln(**y**) = ln(**x**). This relation can be equivalently expressed as

where **I** ^{n,n} is an identity matrix. If the number of different monomials is greater than the number of dimensions of the polynomial system, **Q** ^{n,d}, *n* > *d*, then **Q** · ln(**y**) = ln(**x**) is overdetermined. In this case, not every **x** will necessarily correspond to a solution **y**. However, since Eq. 19 is the of same mathematical form as 17 we can incorporate it into the system of constraints to guarantee a solution to **y** for each **x**. The output of the transformation from a system of polynomial equations in **y**, using exponential identities and facet enumeration, is a set of linear equations in linear variables, plus a set of equalities and inequalities in ln(**x**) and ln(**y**). In the case of a polynomial system of kinetic equations, **y** encompasses external fluxes, , internal fluxes, , forward and reverse elementary kinetic variables, **k**_{+} and **k**_{−}, metabolite concentrations, **m**, and enzyme concentrations, **e**. i.e. $\mathbf{y}={[\overline{\mathbf{v}},\tilde{\mathbf{v}},{\mathbf{k}}_{+},{\mathbf{k}}_{-},\mathbf{m},\mathbf{e}]}^{\top}\in {\mathbb{R}}^{\frac{5n}{2}+m}$. Here we assume a unique enzyme for each reaction, so the dimension of **y** will be lower if the same enzyme catalyses multiple reactions. Thus we have the combined system of *m* linear equalities on linear variables, *b* + *n* linear equalities on logarithmic variables, and *c* linear inequalities on logarithmic variables.

where the various subscripts to **Q** denote the columns appropriate to the variables indicated. A detailed example of the transformation is given in Supplementary Material D. In the case of (pseudoelementary) mass action kinetics, where flux is a function of a single monomial, then vertex enumeration is not required to exactly linearise the equations in terms of logarithmic variables. In section 7 we provide such an example, with an application to central *E. coli* metabolism, moreover, see Supplementary Material E for an example of this exact linearisation for a few reversible reactionsE.

## 5 Thermodynamic and kinetic inequalities

Broad bounds on metabolite or enzyme concentrations, fluxes or phenomenological kinetic variables, are available from theoretical consideration of intracellular kinetics (21) guided in specific cases by experimental data. In all cases, whether due to lack of knowledge or inherent errors in experimental determination, such data results in a pair of bounding inequalities on the variable of interest. For example, estimates of standard chemical potential come with a margin of uncertainty. The effect of this uncertainty on prediction of reaction directionality is discussed elsewhere (31, 52). This uncertainty may be represented by inequalities on the allowed range of standard chemical potential

Likewise, broad estimates on physiologically realistic intracellular metabolite concentrations (20) lead to similar bounding inequalities. When experimental metabolite concentrations are available this data leads to tighter bounds on the model. Model representation of inter-compartmental concentration gradients is essential to permit net diffusive flux of transport reactions. Such considerations, beyond the realm of flux balance analysis, are vital to permit *in silico* growth of more realistic models which include potential gradients, in addition to fluxes. Of particular importance is the difference in concentration of *H*^{+} ions between separate compartments of the electrochemical chain (23). This electron-motive force can be represented in compartmentalised, charge balanced, constraint based models (20) when experimental data on intra- and periplasmic pH is included.

Establishing tight bounds on genome wide enzyme concentrations is a significant problem. Given the high level of noise in both microarray (29) and quantitative proteomics data (16) it is difficult to make conclusions on the relationship between transcript and protein abundance. If one assumes enzyme concentration is proportional to transcript abundance in global gene expression studies(14), then enzyme concentration can be quantitatively estimated through protein molecular weight, estimates of total dry weight of protein per cell and thermodynamically grounded preprocessing of raw microarray data (9). At best, such effort with experimental data gives rise to loose bounds on intracellular enzyme concentration for a particular nutrient medium appropriate to the microarray data (8). However, this may be important if the upper bound is so close to zero as to render flux through that particular enzyme inconsequential with respect to the flux of active pathways. As such, detecting the presence versus absence of a transcript is a crucial output of microarray preprocessing (10). As experimental systems biology continues its rapid progress one can expect tightening of these inequalities.

## 6 Integrated stoichiometric, thermodynamic and kinetic constraint-based modeling

The rationale for the reformulation of constraints becomes apparent when one compares the mathematical form of the resulting equations. Eq. 1, the steady state mass conservation invariant and Eq. 20, the linear portion of the reformulated kinetic constraints are of the same mathematical form and together can be represented by

where and denote the external and internal columns of the stoichiometric matrix. Equation 4, the novel compact reformulation of energy conservation and the second law of thermodynamics, Eq. 11, the linear reformulation of the thermodynamic Haldane in terms of logarithmic kinetic variables and Eq. 21, the logarithmic equality portion of the reformulated kinetic constraints, are all linear forms in logarithms and together can be represented by

Experimental data or theoretical estimates of kinetic or thermodynamic variables give rise to bounding inequalities such as 23. By taking the logarithm, splitting paired bounding inequalities into two inequalities and reversing the sign of the upper bound inequality, we have the same form as Eq. 22, the logarithmic inequality portion of the reformulated kinetic constraints. Combined they are of the form

Observing equations 24, 25 and 26, biologically, they represent the combination of many different constraints, but mathematically, they are in the same form as equations 20, 21 and 22, derived from a system of polynomial equations. If we let **w** ln(**x**) then equations 20, 21 and 22 are mathematically equivalent to

but this has the algorithmic advantage that a wide dynamic range of variable is facilitated in logarithmic scale. In fact, all constraints formulated thus far could be expressed as a system of polynomial equations by applying the inverse of the transformation sequence outlined in subsection 4.3.

## 7 Application to *Escherichia coli*

We tested the predictive capacity our approach by implementing mass conservation, energy conservation and the second law of thermodynamics for a core metabolic model of *E. coli* metabolism (*m* = 72 metabolites, 76 pairs of internal reactions and *ñ* = 19 net exchange reactions) (37). We assumed that each overall reaction followed pseudo-elementary mass action kinetics and thus, applied thermodynamic Haldane constraints to the ratio of forward and reverse pseudo-elementary rate constants. The equality constraints took the general form

where the boundary condition on the mass conservation constraints, **a** − · , was derived from the optimal exchange fluxes obtained using flux balance analysis. For the latter, we fixed uptake/secretion rates for D-glucose, oxygen, ethanol, acetate, D-lactate, succinate, pyruvate and formate corresponding to aerobic growth on glucose minimal medium in a chemostat (Sample ID: GR04, (30)) . As described previously (52), for all metabolites, we used known standard (Legendre) transformed chemical potentials, **u**^{o}′, back-calculated from experimentally measured equilibrium constants.

We used a bespoke solver to satisfy the non-linear, non-convex feasibility problem posed by Eq. 28 and 29. Due to uncertainty in the absolute values of *in vivo* kinetic parameters, and the nature of available fluxomic data, we focused on comparing net flux vectors to evaluate the predictive capacity. Also, where two reactions occur sequentially in an unbranched pathway, we compare only one predicted flux with fluxomic data. Figure 2 compares linear, linear-logarithmic prediction of net flux rates with fluxomic data, and also with net flux rates predicted by flux balance analysis. With flux balance analysis, typically there exist alternative optimal net flux vectors satisfying the same optimal value of the linear objective function, in this case, the biomass production rate. Therefore, we used flux variability analysis (34) to predict the maximum and minimum net flux rates consistent with mass conservation at an optimal biomass production rate and the aforementioned fixed exchange reaction rates.

In comparison with flux balance analysis, our linear, linear-logarithmic method does provide a more accurate prediction of the flux split between the decarboxylating oxidative branch of the pentose phosphate pathway and the proximal reactions of glycolysis. Also, our method improves the flux prediction through the glycoxylate bypass of the distal, oxidative part of the tricarboxylic cycle. Overall our flux prediction is superior to that by flux balance analysis; however, the prediction of flux through other reactions, such as the balance between different fermentative end products, does still not agree well with the fluxomic data. This preliminary comparison with experimental data indicates that the inclusion of thermodynamic constraints does refine our ability to predict *in vivo* flux. Further comparison with fluxomic data from various experimental conditions will be required for a definitive conclusion of predictive capacity.

We also tested our approach with a genome scale model of *E. coli* metabolism, iAF1260 (20) (data not shown). However, without additional inequality constraints, it is not possible to represent the qualitatively assigned local reaction directionality which accompany the iAF1260 model (52). Such constraints seem essential for a reliable prediction using flux balance analysis but their inclusion into the current formulation presents a significant numerical analysis challenge. The main difficulty is that local reaction directionality constraints may give rise to a feasible flux balance analysis problem, but an infeasible problem when additional thermodynamic constraints are required to be satisfied. Moreover, when a solver fails to converge to the solution of a non-convex feasibility problem, it is presently difficult to differentiate between a limitation in the ’solving’ capability of the algorithm versus an infeasible problem to begin with. This problem is an inherent feature of many non-convex feasibility problems. We continue efforts to add the capability to reliably solve such larger scale problems with inequality constraints.

## 8 Discussion

We have demonstrated how the mathematical constraints representing steady state mass conservation, energy conservation, the second law of thermodynamics, reversible polynomial kinetics, experimental and theoretical estimates of thermodynamic and kinetic variables can all be integrated into the composite mathematical form of equations 27. This reformulation is mathematically exact in the sense that it uses no approximations around arbitrary reference states, as with the ’linlog kinetics’ approach (45). The thermodynamic reformulation relies on a fundamental equation in non-equilibrium steady state thermodynamics (5) complementing existing work of Beard et. al. (6). The reformulation of reversible polynomial kinetic equations relies on judicious application of exponential identities, their logarithmic analogues, and a novel application of a well known principle from convex geometry. This principle is the representational equivalence of a polytope, as a set of vertices, a vertex representation, or a set of linear equations and inequalities, a half-space representation. The mathematical incorporation of numerical bounds on experimental and theoretical estimates of thermodynamic and kinetic variables, detailed in table 1 follows trivially in this constraint-based approach.

**...**

The mathematical generality of the integrated system of linear, linear-logarithmic constraint equations 27, lies in the possibility to reformulate any system of polynomial equations in such a form. For constraint-based modelling of biological networks, this means that, theoretically, any new nonlinear constraints, formalised as an arbitrary system of implicit polynomials, can be integrated with the existing constraint-based modelling of steady state metabolism. Practically, computational complexity may limit the size of polynomial system that can be transformed in reasonable time. In systems biology, variants of vertex enumeration algorithms are used to compute a convex basis for the stoichiometric matrix (44). These convex bases correspond directly to stoichiometrically balanced pathways of contiguous reactions. Facet enumeration is dual to vertex enumeration, so the worst case computational complexity of vertex enumeration is identical to facet enumeration. Given **Q** ^{n,d}, the latest algorithms (4) find all facets in worst case time of (*nd*^{2}) per facet (3).

The number of facets contributes to the size of the constraint satisfaction problem through the matrix **A** in Eq. 24 and matrix **B** in Eq. 25. To our knowledge, there is no closed form for the number of facets corresponding to a given a set of vertices. However, it is possible to estimate the number of facets and hence the computation time, prior to facet enumeration (4). Assuming mass action kinetics for each reaction, the integration of kinetic constraints does not require facet enumeration, see Supplementary Material E. In this case the numerical constraint feasibility problem requires a simultaneously solving *m* linear mass conservation constraints (one for each metabolite) Eq. 24, $\frac{5n}{2}+m$ linear thermodynamic & kinetic equalities on logarithmic variables (for $\frac{n}{2}$ reversible elementary reactions) Eq. 25, and *m* linear inequalities on the estimated range of *in vivo* standard chemical potential.

The motivation for this work arose out of a requirement to integrate various constraints into a concise mathematical form, yet preserve desirable characteristics of the equations such that they are amenable to numerical analysis. Even though the feasible set is non-convex, and therefore non-linear, the near-linear structure of equations 27, even with a combination of linear and logarithmic variables, indicates potential for scalable numerical analysis. Moreover, with a central metabolic model of *E. coli* metabolism, we demonstrated that our method can be readily applied to improve the prediction of *in vivo* fluxes, in comparison with flux balance analysis. Nevertheless, future biological applications will be required in order to establish the generality of such conclusions for growth of other in a range of different environments.

With regard to genome scale models metabolism, the current numerical challenge lies in solving this non-convex feasibility problem when there are non-trivial inequality constraints. It practice, the current difficulty is distinguishing between failure of the current solver to converge due to (i) problem infeasibility due to over tight inequality constraints, versus (ii) failure to converge due to a shortcoming in algorithm design. It is important to realise that this same issue arises with any system of constraint equations and inequalities, defining a non-convex feasible set (36). It is not particular to our refomulation, and can occur with large systems of polynomial rate equations without inequalities.

The numerical solution of large systems of polynomial rate equations, even without inequality constraints, is as yet an open research question (36). Further work needs to be done to either, develop an algorithmic test for infeasibility of linear, linear-logarithmic equations with non-trivial inequalities, or, develop an algorithmic approach which is guaranteed to converge to a solution, if a solution exists. It is interesting to note that systems of linear, linear-logarithmic equations also arise in the classical problem of predicting metabolite concentrations at equilibrium (50, 46). There, the requirement for equilibrium concentration to be positive requires a trivial inequality constraint and the algorithms for solving such systems can be proven to converge (50). It remains to be seen if the sucessful algorithmic approaches from chemical reaction equilibrium analysis can be adapted to solve our novel non-equilibrium formulation with non-trivial inequality constraints.

## Acknowledgements

The authors would like to thank Neema Jamshidi & Hong Qian for critical reading of the manuscript. R.M.T.F. was supported by a National University of Ireland, Galway, Science Faculty Fellowship. I.T. was supported by NIH grant Grant 5R01GM057089-11.

## Footnotes

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

## Contributor Information

R.M.T. Fleming, Center for Chromosome Biology, School of Natural Sciences, National University of Ireland, Galway.

I. Thiele, Center for Systems Biology, University of Iceland, Reykjavik, Iceland.

G. Provan, Department of Computer Science, University College Cork.

H.P. Nasheuer, Center for Chromosome Biology, School of Natural Sciences, National University of Ireland, Galway, Ireland, and Systems Biology Ireland, Galway, Ireland.

## References

*Escherichia coli*K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Molecular systems biology. 2007;3:e121. [PMC free article] [PubMed]

*Escherichia coli*metabolism. Biophysical journal. 2006;90:1453–1461. [PMC free article] [PubMed]

*Escherichia coli*K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature. 2002;420:186–189. [PubMed]

*E. coli*to perturbations. Science (New York, N.Y.) 2007;316:593–597. [PubMed]

*Methanosarcina barkeri*. Biotechnology and bioengineering. 2001;75:170–180. [PubMed]

*Escherichia coli*and

*Salmonella*: Cellular and Molecular Biology, ASM Press, chapter Reconstruction and use of microbial metabolic networks: the core

*Escherichia coli*metabolic model as an educational guide. 2009 (in press).

*Springer Series in Chemical Physics*. New York: Springer; 2008.

*Escherichia coli*: I. Synthesis of biosynthetic precursors and cofactors. Journal of theoretical biology. 1993;165:477–502. [PubMed]

*Escherichia coli*. Biophysical chemistry. 2009;145:47–56. [PMC free article] [PubMed]

*E. coli*’s transcriptional and translational machinery: A knowledge-base, its mathematical formulation, and its functional characterization. PLoS computational biology. 2009;5:e1000312. [PMC free article] [PubMed]

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (1022K)

- Including metabolite concentrations into flux balance analysis: thermodynamic realizability as a constraint on flux distributions in metabolic networks.[BMC Syst Biol. 2007]
*Hoppe A, Hoffmann S, Holzhütter HG.**BMC Syst Biol. 2007 Jun 1; 1:23. Epub 2007 Jun 1.* - A scalable algorithm to explore the Gibbs energy landscape of genome-scale metabolic networks.[PLoS Comput Biol. 2012]
*De Martino D, Figliuzzi M, De Martino A, Marinari E.**PLoS Comput Biol. 2012; 8(6):e1002562. Epub 2012 Jun 21.* - Towards kinetic modeling of genome-scale metabolic networks without sacrificing stoichiometric, thermodynamic and physiological constraints.[Biotechnol J. 2013]
*Chakrabarti A, Miskovic L, Soh KC, Hatzimanikatis V.**Biotechnol J. 2013 Sep; 8(9):1043-57. Epub 2013 Aug 20.* - The application of flux balance analysis in systems biology.[Wiley Interdiscip Rev Syst Biol Med. 2010]
*Gianchandani EP, Chavali AK, Papin JA.**Wiley Interdiscip Rev Syst Biol Med. 2010 May-Jun; 2(3):372-82.* - Is it possible to predict any properties of oxidative phosphorylation in a theoretical way?[Mol Cell Biochem. 1998]
*Korzeniewski B.**Mol Cell Biochem. 1998 Jul; 184(1-2):345-58.*

- k-OptForce: Integrating Kinetics with Flux Balance Analysis for Strain Design[PLoS Computational Biology. ]
*Chowdhury A, Zomorrodi AR, Maranas CD.**PLoS Computational Biology. 10(2)e1003487* - Consistent Estimation of Gibbs Energy Using Component Contributions[PLoS Computational Biology. 2013]
*Noor E, Haraldsdóttir HS, Milo R, Fleming RM.**PLoS Computational Biology. 2013 Jul; 9(7)e1003098* - Model-driven elucidation of the inherent capacity of Geobacter sulfurreducens for electricity generation[Journal of Biological Engineering. ]
*Mao L, Verwoerd WS.**Journal of Biological Engineering. 714* - A proof for loop-law constraints in stoichiometric metabolic networks[BMC Systems Biology. ]
*Noor E, Lewis NE, Milo R.**BMC Systems Biology. 6140* - Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods[Nature reviews. Microbiology. ]
*Lewis NE, Nagarajan H, Palsson BO.**Nature reviews. Microbiology. 10(4)291-305*

- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- Integrated stoichiometric, thermodynamic and kinetic modelling of steady state m...Integrated stoichiometric, thermodynamic and kinetic modelling of steady state metabolismNIHPA Author Manuscripts. Jun 7, 2010; 264(3)683PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...