• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Sep 2004; 14(9): 1797–1805.
PMCID: PMC515326

Genome-Scale In Silico Models of E. coli Have Multiple Equivalent Phenotypic States: Assessment of Correlated Reaction Subsets That Comprise Network States

Abstract

The constraint-based analysis of genome-scale metabolic and regulatory networks has been successful in predicting phenotypes and useful for analyzing high-throughput data sets. Within this modeling framework, linear optimization has been used to study genome-scale metabolic models, resulting in the enumeration of single optimal solutions describing the best use of the network to support growth. Here mixed-integer linear programming was used to calculate and study a subset of the alternate optimal solutions for a genome-scale metabolic model of Escherichia coli (iJR904) under a wide variety of environmental conditions. Analysis of the calculated sets of optimal solutions found that: (1) only a small subset of reactions in the network have variable fluxes across optima; (2) sets of reactions that are always used together in optimal solutions, correlated reaction sets, showed moderate agreement with the currently known transcriptional regulatory structure in E. coli and available expression data, and (3) reactions that are used under certain environmental conditions can provide clues about network regulatory needs. In addition, calculation of suboptimal flux distributions, using flux variability analysis, identified reactions which are used under significantly more environmental conditions suboptimally than optimally. Together these results demonstrate the utilization of reactions in genome-scale models under a variety of different growth conditions.

Constraint-based modeling of reconstructed genome-scale metabolic networks has proven useful for understanding and predicting the genotype-phenotype relationship in microbes (Ibarra et al. 2002; Segre et al. 2002; Stelling et al. 2002; Fong et al. 2003; Forster et al. 2003; Price et al. 2003; Reed and Palsson 2003). Within this analysis framework, a variety of methods have been developed to characterize the metabolic steady-state flux solution space and select solutions within that space that might be physiologically relevant, including elementary mode analysis and extreme pathway analysis (Papin et al. 2003), flux balance analysis (Kauffman et al. 2003), MOMA (Segre et al. 2002), OptKnock (Burgard et al. 2003), flux-coupling (Burgard et al. 2004), random sampling (Almaas et al. 2004; Price et al. 2004; Wiback et al. 2004), and flux variability analysis (Mahadevan and Schilling 2003). Flux balance analysis uses linear optimization to find a flux distribution that maximizes a particular objective function (e.g., growth rate or ATP production; Varma and Palsson 1994; Kauffman et al. 2003). However, there are often multiple flux distributions that are equally optimal (value of the objective function is the same) giving rise to the concept of multiple alternate optima (Lee et al. 2000; Mahadevan and Schilling 2003). The existence of such multiple optima would correspond to the biological notion of silent phenotypes (Raamsdonk et al. 2001), that is, the same observed overall cellular function is achieved with different uses of underlying reaction networks (Fong et al. 2003).

Mixed-integer linear programming (MILP) has been used to study optimal solutions by identifying the minimum number of reactions needed for optimal growth (minimum reaction sets; Burgard and Maranas 2001; Burgard et al. 2001), as well as enumerating alternate basic optima for small metabolic networks, where the objective function takes on the same value but the flux distributions through the metabolic network are different (Lee et al. 2000; Phalakornkule et al. 2001). The alternate optimal flux distributions calculated using the MILP algorithm differ in that they all use a different set of reactions. For the small-scale networks previously studied (Lee et al. 2000; Phalakornkule et al. 2001), the number of alternate optima was on the order of 10 solutions. In genome-scale networks, a large number of network redundancies exist (Mahadevan and Schilling 2003) creating computational challenges in calculating all alternative optima. One approach to explore the range of optimal solutions is to fix the growth rate and then calculate the minimum and maximum flux values through each reaction in the network (referred to in this paper as `flux variability analysis'). The approach was recently used to analyze the initial genome-scale network for Escherichia coli (iJE660a; Edwards and Palsson 2000) for a select number of environmental conditions including aerobic growth on glucose, acetate, and D-lactate (Mahadevan and Schilling 2003).

Here we applied the recursive MILP algorithm (Lee et al. 2000) to calculate a subset of the alternate optimal solutions in an expanded genome-scale model of E. coli (iJR904; Reed et al. 2003) for a large number of media conditions (136). Comparisons between the optimal solutions characterized by the flux variability analysis and those calculated with an MILP approach found that the first 500 solutions identify all of the variable fluxes in the set of alternate optima, but do not fully capture the magnitude of the flux variability through the individual reactions. Looking at the alternate optima across the 136 simulated growth environments, a number of reactions were found to be utilized in all of the calculated alternative optima, whereas others were used under only a few environmental conditions. This collection of optimal solutions was studied to provide insights into the candidate regulatory mechanisms and strategies that would allow E. coli to grow optimally. Flux variability analysis was also used to study how fluxes are used in suboptimal solutions, and this approach identified reactions that if used would lead to suboptimal growth by the cell. Thus, the present study is the most comprehensive evaluation of alternative optima in genome-scale models to date.

RESULTS

We first computed a sample of 56,756 optimal solutions under 136 growth environments. We then assessed the properties of these solutions and found the correlated reaction subsets in these optimal solutions. The correlated reaction subsets were then compared to known regulon structures and expression profiling data sets. For aerobic and anaerobic growth on glucose, we compared the utilization of reactions in the alternative optima to described transcriptional regulation of the genes that are associated with some of these fluxes. Finally, we compared the reaction usage in optimal growth solutions to those in suboptimal growth solutions.

Comparing Results From MILP and Flux Variability Analysis

Previous metabolic networks studied using MILP have enumerated a finite number of flux distributions (Lee et al. 2000). A modified version of the algorithm (see Methods section) was implemented here and used to calculate basic optimal solutions for 88 aerobic and 48 anaerobic growth conditions, where different carbon sources were used (see Table 1 for a list of simulated carbon sources). The set of optima for a given environment will be referred to in this paper as the `condition-specific optima'. Application of the modified MILP algorithm to a genome-scale metabolic network showed that for many carbon sources the number of alternate optima was large, making it computationally difficult to enumerate all of the optimal solutions. As such, only the first 500 optimal solutions were calculated for each of the 136 (= 88 + 48) environmental conditions. For some simulated environments, aerobic growth with glycine as the sole carbon source and for all 48 anaerobic conditions, it was possible to enumerate all of the alternate optima.

Table 1.
Allowable Carbon Sources

To investigate how well the calculated condition-specific optima spanned or represented the actual range of optimal solutions, the number of varying fluxes and the range of those fluxes were calculated from the different sets of condition-specific optima and compared to the results using flux variability analysis. Figure 1 shows that for the 88 aerobic conditions, the first 150 optimal solutions are sufficient to identify all of the variable fluxes among the set of condition-specific optima, whereas determining the numerical range of these fluxes in the set sometimes required the calculation of more optima. For some carbon sources, the magnitude of the flux ranges when looking at all 500 optimal solutions was smaller than the actual flux ranges calculated using flux variability analysis. These results imply that the first 500 optimal solutions are adequate for getting a sample of the full set of condition-specific optima, while still remaining computationally tractable. With adequate sampling, the set of condition-specific optima can be further analyzed.

Figure 1
Comparisons of properties for sampled optima with all optima. The number of variable fluxes and the allowable ranges for these fluxes across all optima were calculated using a flux variability analysis. Each line is for one of the 88 carbon sources capable ...

Properties of Alternate Optima

The average optimal flux distribution, found among the 136 growth conditions considered, used 294 of the 931 internal reactions in iJR904, and for a given growth condition the average number of variable fluxes was 49 (with the number of variable fluxes typically being higher under aerobic conditions). Interestingly, with the exception of anaerobic growth on adenosine and deoxyadenosine, none of the exchange fluxes with the environment varied across alternate optima. This result indicates that the external state or phenotype of the cell is normally unique for a set of condition-specific optima, whereas flexibility in the internal fluxes accounts for the different optima. In all, there were 140 internal fluxes and five exchange fluxes that were variable under at least one of the 136 tested environmental conditions (see Supplemental material). Most of the flexibility in the network resides with reactions using different electron carriers, nucleotide salvage reactions, and central metabolic reactions. There were quite a few instances in central metabolism where the flux through a reaction was variable across the different optima (most of glycolysis, TCA cycle, anaplerotic reactions, and oxidative phosphorylation reactions).

The complete set of alternate optima, both aerobic and anaerobic, are a set of 56,756 flux distributions that were further studied—grouped together they are referred to here as the `mixed optima'. For each reaction in the network, the fraction of the optimal solutions, in the set of mixed optima, which use that particular reaction (fopt) was calculated (Fig. 2A, also available in Supplemental material). Reactions in the iJR904 model were previously assigned to different metabolic subsystems based on metabolic function, and Figure 2B shows for each subsystem how many reactions are assigned to that subsystem and the distribution of fractional usage within this subsystem. For example, there are 40 oxidative phosphorylation reactions: 65% of these are never used in any of the mixed optima, 20% have a fopt between 0 and 0.25, 3% have a fopt between 0.25 and 0.5, 8% have a fopt between 0.75 and 1.0, and 5% are used in all optima.

Figure 2
Reaction usage in optimal flux distributions. (A) shows for each reaction in the metabolic network, what fraction of the optimal flux distributions utilize this reaction (fopt). The reactions are then rank-ordered by frequency of use in optimal flux distributions. ...

A total of 201 reactions were used in all optimal solutions across the different environmental conditions; these include reactions involved in amino acid metabolism (cys, his, ile, leu, lys, met, phe, and tyr—the amino acids that cannot serve as carbon sources), folate metabolism, and membrane lipid biosynthesis. These 201 reactions are needed for optimal growth across all 136 simulated growth environments, and are related to the intersection of minimal reactions sets (Burgard and Maranas 2001; Burgard et al. 2001) for the different growth environments. Comparisons to experimental gene essentiality data for growth on rich media (Gerdes et al. 2003) shows that of the 201 reactions, 81 are associated with genes essential for growth on rich media, 20 have multiple isozymes explaining why single knockouts might not have been essential experimentally, and another 20 do not have any ORFs associated with it in iJR904. For the remaining 80 reactions, the rich media might contain metabolites that cannot be used as sole carbon sources, allowing for some of these reactions to be unessential. All mixed optima use histidine biosynthesis reactions, but if histidine was present in the growth medium these associated genes would be unessential. These 201 reactions are needed for optimal growth across the tested environmental conditions, but not necessarily for suboptimal growth, further explaining why not all of the reactions were associated with lethal genes.

In contrast to the 201 reactions that are used across all of the mixed optima, a number of reactions (351) were never utilized by any of the optimal solutions for any of the tested environments. These include 185 blocked reactions—reactions that will never be used by the network even if all exchange fluxes are free (Burgard et al. 2004). In addition to these blocked reactions are reactions needed for unsimulated environments (e.g., anaerobic growth with alternate electron acceptors such as dimethylsulfoxide or nitrate) and reactions that are less efficient than others (e.g., oxidative phosphorylation reactions that transfer different amounts of protons across the membrane). This first-pass analysis of the mixed optima can be advanced by a more detailed investigation of how reactions are used relative to each other and with respect to specific environmental conditions.

Correlated Reaction Sets

Further examination of the mixed optima identified sets of reactions that are always used together in a flux distribution; these reaction sets have been previously referred to as `correlated reactions,' `fully coupled reactions,' and `reaction/enzyme subsets' (Papin et al. 2002; Schilling et al. 2002; Schuster et al. 2002; Burgard et al. 2004). The 66 correlated reaction sets calculated in the present study arise from flux distributions which maximize biomass yields, leading to the hypothesis that the reactions in a correlated reaction set would have similar regulation if a cell uses its metabolic network to maximize biomass production. The number of reactions in a set of correlated reactions varied between two and nine, with a set size of two being the most frequent (Fig. 3).

Figure 3
Distribution of correlated reaction sets. The 66 correlated reaction sets can be categorized by the number of reactions in a set as well as the set's metabolic purpose (metabolite biosynthesis or degradation or other). Using known regulation from databases ...

The regulatory structure in E. coli—taken from EcoCyc (Karp et al. 2002), RegulonDB (Salgado et al. 2004), and primary literature data—was used to test whether the genes associated with correlated reaction sets have been found to be coregulated. For 9% of the correlated reaction sets no comparisons could be made, because there were not at least two reactions in the set with associated genes. Comparisons between regulon information and correlated reaction sets showed that 68% of the correlated reaction sets were consistent with established regulatory structure (where more than half of the associated genes belonged to the same regulon), and for 23% of the correlated reaction sets there was only weak evidence of conserved regulatory structure (less than half of the associated genes belonged to the same regulon). The transcriptional regulatory network in E. coli, however, has not been completely elucidated. Recent analysis of expression data in the context of a regulatory model indicates that roughly only 25% of the transcriptional regulatory mechanisms have been described in the primary literature (Covert et al. 2004).

As a further comparison, expression data were used to determine whether genes in a correlated reaction set have similar expression patterns under different conditions. Using publicly available AffyMetrix data, from the ASAP database, (Allen et al. 2003), the average pairwise correlation coefficient between genes involved in a reaction set was calculated. P-values, indicating whether the correlation coefficient is significant, were calculated by computing the average correlation coefficients from a random set of genes (see Methods). The resulting average correlation coefficients and associated P-values for the correlated reaction sets are shown in Figure 4A (also available in Supplemental materials). For comparison the average correlation coefficients and P-values for genes belonging to the same transcription unit (Karp et al. 2002) were calculated (Fig. 4B). Only transcription units with at least two genes with measured expression data were included in the analysis.

Figure 4
Correlation of genes in correlated reaction sets and transcription units based on expression data. The average correlation coefficients between genes associated with reactions in correlated reaction sets was calculated using a set of publicly available ...

The transcription units appear to have more significant correlation than the correlated reaction sets: 18 out of 66 (27.3%) correlated reaction sets have P-values less than 0.05, whereas 159 of 321 (49.5%) transcription units have P-values less than 0.05. Almost all of the correlated reaction sets with significant correlation were classified as having evidence of being part of the same regulon (17). The number of times a correlated reaction set is used in the mixed optima does not seem to affect how well the set correlates with the expression data.

Condition-Dependent Fluxes

The condition-specific optimal solutions for glucose aerobic and glucose anaerobic growth were further analyzed to investigate whether the transcriptional regulatory network controlling metabolic enzymes pushes the cell towards an optimal state. The fraction of solutions that use a particular reaction under glucose aerobic conditions was compared to the fraction for glucose anaerobic conditions (Fig. 5, also available in Supplemental materials). Each point in Figure 5 represents a different metabolic reaction. One may expect that reactions that fall below the line (meaning that they are used more highly in glucose aerobic optima) would have their genes up-regulated under aerobic conditions and down-regulated under anaerobic conditions; the opposite would be true for reactions falling above the line.

Figure 5
Preferential usage of fluxes under aerobic glucose vs. anaerobic glucose optimal growth. Each point in the graph represents one of the 931 metabolic reactions; the x-axis plots the fraction of optimal solutions that utilize that reaction under glucose ...

A previously developed regulatory model, iMC1010v2, was used to compare these predictions with the transcriptional regulatory structure in E. coli. The iMC1010v2 model was constructed based on established regulatory mechanisms and went through one iteration of improvement using transcription factor knockout strains whose expression was measured under glucose aerobic and glucose anaerobic conditions (Covert et al. 2004). The transcriptional regulatory model predicted for 61 reactions that more isozymes would be expressed under aerobic glucose conditions (aerobic reaction set), and for 53 reactions that more isozymes would be expressed under anaerobic glucose conditions (anaerobic reaction set). Black squares in Figure 5 are used to indicate reactions belonging to the aerobic reaction set, white squares for reactions in the anaerobic reaction set, and gray squares for the remaining reactions in the model. Assuming that the bacteria operate optimally, it would be expected that the white points would fall above the line and the black points below the line.

Most of the reactions that have predicted changes in the expression of associated genes, based on iMC1010v2, are not used in any of the condition-specific optimal solutions for glucose aerobic or glucose anaerobic conditions. For both the aerobic and anaerobic reaction sets, nine reactions in each set fell on the predicted half of the graph. Discrepancies occurred in both reaction sets, where more isozymes are expressed under the condition that utilizes a reaction the least under optimal conditions.

One of the two discrepancies in the aerobic reaction set is used only slightly more in the anaerobic case (fanaerobic = 0.681 vs. faerobic = 0.676); further sampling of the alternate optima could result in higher aerobic usage (faerobic). The other aerobic reaction set discrepancy is for the glycolate transport reaction (GLYCLT2r) which is used in all glucose anaerobic optimal solutions and in no glucose aerobic optimal solutions. Deletion of GLYCLT2r from the network only drops the maximal anaerobic growth rate by 0.1%, so even if the transporter is not expressed there would be negligible differences in observed growth rate. It is also important to note that the model predicts that glycolate cannot serve as a carbon source under anaerobic conditions, explaining why this transporter might be expressed aerobically.

There are also six discrepancies for reactions in the aerobic reaction set: hydrogenase reactions (HYD1, HYD2, HYD3), fumarate reductase (FRD2), formate-hydrogen lyase (FHL), and formate dehydrogenase (FDH2). Like the glycolate transporter, all six reactions can be deleted simultaneously with negligible effect on the predicted maximal growth rate (drops only by 0.1%). The fumarate reductase enzyme is responsible for two reactions, FRD2 and FRD3 (where different quinones are used as electron donors in the two reactions). The FRD3 reaction is used in all glucose anaerobic optima, explaining why the fumarate reductase enzyme is expressed under anaerobic conditions.

Suboptimal Fluxes

The previous sections focused on properties of optimal solutions; this section examines properties of suboptimal solutions. Flux variability analysis rather than MILP was used to calculate the variability of fluxes for different fixed growth rates (99% maximal growth, 90% maximal growth, 50% maximal growth, and 25% maximal growth) under the 88 aerobic environments. For each of the aerobic growth conditions (see Table 1) the set of suboptimal reactions (those reactions that can be used in a suboptimal flux distribution) for the different suboptimal growth rate levels was identical. The fraction of environmental conditions, fenv, (out of a total of 88 considered) in which a reaction can be used in an optimal solution (gray curve) or suboptimal solution (black curve) was determined (Fig. 6; note that the 185 blocked reactions are left out of this figure). The black curve in Figure 6 is always greater than or equal to the gray curve, indicating that the set of optimal reactions (those reactions that can be used in an optimal flux distribution) is always a subset of the suboptimal reactions. Again reactions can be categorized according to their usage: 380 reactions are used under the same number of conditions optimally and suboptimally (ignoring blocked reactions), and 366 reactions are used under more growth conditions suboptimally than optimally (see Supplemental materials).

Figure 6
Reaction usage in optimal and suboptimal flux distributions. For each of the tested 88 aerobic environmental conditions, reactions were classified as being used in optimal solutions or used only in suboptimal solutions. The black line in the figure shows ...

Looking at the usage patterns of exchange fluxes in optimal and suboptimal aerobic solutions (these are not shown in Fig. 6) gives insights into how certain metabolites are used by the cell (Table 2). Some metabolites can only serve as carbon sources and will never be secreted either optimally or suboptimally; this is true for 52 extracellular metabolites. Thirteen extracellular metabolites can be secreted suboptimally and cannot be used as carbon sources (they can only serve as by-products). Another set of 30 metabolites can be used as a sole carbon source and can be secreted during suboptimal growth on any of the other carbon sources, and additionally six carbon sources can be secreted during suboptimal growth on only a limited number of other carbon sources. Finally, the metabolites urea, xanthine, uracil, and indole cannot serve as carbon sources and are secreted optimally under aerobic growth conditions on a select number of carbon sources.

Table 2.
Characterization of Extracellular Metabolites by Their Roles in Various Single Carbon Aerobic Environments

Interestingly, a set of 147 metabolic reactions are used suboptimally but never optimally (Fig. 6), so the presence of the responsible enzymes under the tested conditions would drive the cell towards a less optimal state. These reactions include ABC transporters, when alternate transport mechanisms are available; the first step in the Entner Dourdoroff pathway; phospholipid recycling; nucleotide degradation; transporters associated with by-products that cannot be utilized as carbon sources; and oxidative phosphorylation reactions.

DISCUSSION

Alternate optimal growth solutions exist in genome-scale models, and these solutions need to be studied. Herein, application of an MILP algorithm to a genome-scale metabolic model of E. coli revealed that the number of alternate optima for genome-scale models is often large, making it computationally challenging to enumerate all of the optima for some conditions. Analysis of the calculated sets of optimal solutions showed that: (1) only a small subset of reactions in the network have variable fluxes across optima, (2) correlated reaction sets showed moderate agreement with the regulatory structure elucidated to date using classical methods and with expression data, and (3) condition-dependent reactions help provide clues about network regulatory needs. The additional calculation of suboptimal flux distributions identified reactions which are used under more environmental conditions suboptimally than optimally.

There are many more alternate optima than the number of fluxes that can vary across the set of alternate optimal solutions. Only a relatively small subset of reactions in the network (140 of 931 internal reactions) has variable fluxes across optimal solutions. For only a few environmental conditions (2 of 136) will the exchange fluxes vary across optima, indicating that the internal state of the cell is where the variability lies. In some cases it will be difficult to exactly determine which optima a cell might use based on expression data or protein levels, as some variable reactions are carried out by the same enzymes. For example, there are 45 nucleotide salvage reactions whose fluxes can vary under at least one of the tested environmental conditions; however, there are only 19 different enzymes associated with these reactions. For some variable reactions, such as those involved in central metabolism, expression data, proteomic data, and flux data could be used to help identify which reaction a cell is utilizing. Another level of complication in identifying the physiological solution is that an affine convex combination of the basic alternate optima is also an alternate optima; it will also be a valid flux distribution with the same objective value. For these reasons it may prove too difficult to determine exactly what solution a living cell utilizes.

The first 500 alternate optima for each condition were used to identify sets of correlated reactions. The majority of these correlated reaction sets (45 of 66) were consistent with established regulatory structure. However, comparisons made using established transcriptional regulatory structure are limited by the fact that a large portion of the transcriptional regulatory network has not been elucidated. A better agreement with regulon structure and correlated reaction sets in genome-scale models could emerge as the transcriptional regulatory network is further characterized. It was recently shown that the consistency between transcriptional regulatory network structure and expression data can vary depending on the structural features of network elements and functional classes of genes (Herrgard et al. 2003). Analysis of expression data showed that only a quarter of the correlated reaction sets (27.3%) showed significant correlation (P < 0.05) across different conditions. However, 49.5% of the genes belonging to the same transcription unit showed significant correlation across expression data sets. In contrast, comparisons between correlated reaction sets and expression data in yeast found that the correlated reaction sets were highly correlated with expression data (Schuster et al. 2002). The yeast network studied, however, contained only central metabolism, and the study looked at expression data for only a small number of genes under two different environmental conditions.

Unraveling the transcriptional regulatory network in E. coli is currently of great interest to many researchers. Comparing condition-specific optima provides useful insights into why the bacteria choose to express certain enzymes under certain conditions. Comparisons between glucose anaerobic and glucose aerobic reaction usage in optimal solutions generated testable hypotheses, some of which have already been proven experimentally. It will be important to investigate multiple sets of conditional optima, as the reasons behind enzyme regulation might not be apparent by studying only two environmental conditions.

By looking at optimal and suboptimal flux distributions, reactions which are used only in suboptimal solutions can be identified. Why would E. coli over the course of evolution retain these enzymes, since these reactions are never used optimally? These enzymes might be useful for reasons not captured based on the assumed condition of optimal growth. Network topology cannot capture the importance of enzymes that might catalyze a less efficient overall reaction; these enzymes could have higher turnover rates, better kinetics, allosteric regulation, or other such reasons making them beneficial. For example, to convert pyruvate into acetyl-CoA, many of the aerobic optimal solutions use pyruvate formate lyase (PFL) rather than pyruvate dehydrogenase (PDH), because formate, a product of PFL reaction, can be converted to hydrogen gas whose electrons are then carried down the electron transport chain. Using PFL the cell can make slightly more ATP than if it used PDH (with NADH transferring electrons to the electron transport chain). The network topology alone cannot predict possible loss of hydrogen gas making PFL more efficient. As another example, E. coli has two cytochrome oxidases, one capable of translocating two protons and the other capable of translocating 2.5 protons per electron pair donated to oxygen. The enzymes have different affinities for oxygen, making one better than the other at different oxygenation levels (Gennis and Stewart 1996).

For the analysis presented in this paper, further sampling of the alternate optima would not significantly affect the results. From the flux variability analysis results, the set of fluxes that are always used or never used across the mixed alternate optima would not change; however, deviations could occur in the fraction of mixed optima (fopt) that use a particular reaction for those reactions in Figure 2 with fopt between 0 and 1. Investigation of the correlated reaction sets, in conjunction with the flux variability analysis results, also indicated that these sets would not change if more sampling of the optima took place. It should be noted, however, that biases in the sampling could affect other types of analysis of the resulting flux distributions, such as the distribution of flux values through individual reactions (Wiback et al. 2004).

Taken together, these in silico results indicate that an optimal E. coli growth phenotype might be achievable by a large number of internal flux distributions; distinguishing the differences between these optima experimentally might prove difficult. In addition, studying the optimal and suboptimal utilization of reactions in the network could lead to understanding why enzymes are expressed under different conditions.

METHODS

Metabolic Network

A recently reconstructed E. coli metabolic network was used in this study (Reed et al. 2003). Type III extreme pathways are thermodynamically infeasible combinations (Beard et al. 2002; Price et al. 2002) of reactions with no net production or consumption of metabolites. To avoid having flux distributions utilizing type III pathways, 17 reactions in the network were removed by constraining the flux through these reactions to zero: CYTDt2, ABUTt2, ACCOAL, GALUi, GLUt4, INSt2, LCADi, ADK1, ADNt2, PROt4, SERt4, ALARi, THMDt2, THRt4, URAt2, URIt2, and VPAMT [reaction abbreviations match those previously reported (Reed et al. 2003) and can be found in the Supplemental materials].

Environmental Conditions Tested

All external metabolites were tested in silico for their ability to support aerobic and/or anaerobic growth in minimal medium. Exchange fluxes for ammonium, phosphate, sulfate, CO2, Fe2+, K+, Na+, water, and protons were allowed to be free. Uptake rates for oxygen were set to a maximum of 1000 and 0 mmole/g DW-h for aerobic and anaerobic simulations, respectively. Substrate uptake rates for the tested carbon sources were set to 10 mmole/g DW-h. Of the 143 external metabolites in the model, 88 supported aerobic growth as the sole carbon source, and only 48 of these also supported anaerobic growth with a biomass yield greater than 0.005 (Table 1). A total of 136 conditions (88 aerobic and 48 anaerobic) were then used to calculate sets of alternative optima.

MILP Algorithm

A recursive algorithm for calculating alternate optima using MILP has been published (Lee et al. 2000). These alternate optima utilize different sets of reactions. This algorithm was used to study a genome-scale metabolic network of E. coli. Some minor alterations to the algorithm were implemented and are described in more detail. The LP for this network is of the following form:

equation M1
1a

equation M2
1b

equation M3
1c

where S is the stoichiometric matrix, v is the steady-state flux vector, and α and β are the upper and lower limits for the individual flux values. In the prior algorithm (Lee et al. 2000), inequality constraints were transformed to standard form by introduction of slack variables. This initial transformation was not done in the present study, because the only inequality constraints were for the individual fluxes; instead the problem remained in the form stated above. The following additional constraints were adopted to ensure that different solutions are calculated (the ranges on the last set of constraints (2d) were modified from the original version of the algorithm):

equation M4
2a

equation M5
2b

equation M6
2c

equation M7
2d

At each iteration, J, at least one of the non-zero fluxes from the previous solution (NZJ-1) must be set to zero, where the binary variable yi is 1 if that flux is selected to be removed from the basis at iteration J (equation 2a). The binary variable wi is subsequently forced to zero if yi is one (equation 2c), and the upper and lower bounds for that particular flux are then constrained to zero (equation 2d). Equation 2b ensures that alternate bases are not revisited by eliminating at least one non-zero variable found in previous iterations. The cplex solver in GAMS (GAMS Development) was used to enumerate the first 500 optima for each environmental condition.

Correlated Reactions Sets

A binary matrix (B) was formed from the set of mixed optimal solutions, where rows correspond to different reactions and columns are different optima. Non-zero fluxes have an entry of 1 in the binary matrix, and zero fluxes have an entry of zero. The correlated reaction sets could be determined by studying the resulting reaction participation matrix (BBT; Papin et al. 2002).

Transcriptional Regulatory Network Predictions

A previously developed transcriptional regulatory model (iMC1010v2; Covert et al. 2004) was used to simulate what genes are expressed under glucose minimal media conditions in the presence and absence of oxygen. The model uses Boolean rules to determine whether a gene is expressed or not expressed under given environmental conditions (Covert et al. 2001).

Analysis of Expression Data

AffyMetrix expression data sets (20) were collected from the ASAP database (Allen et al. 2003); replicates were averaged. The estimated transcript copy number was used in the calculations (for a description see https://asap.ahabs.wisc.edu/~glasner/Protocols/DataDefinitionDefinitions.txt). Gene-protein-reaction (GPR) associations from the iJR904 model (Reed et al. 2003) were used to translate reaction sets into gene sets. MATLAB (Math-works) was used to calculate all pairwise correlation coefficients between associated genes across different data sets. The individual pairwise correlation coefficients for a given reaction set were then averaged, yielding the reported average correlation coefficient. When isozymes were available for a given reaction, the average correlation coefficients were calculated separately using the different isozymes, and only the highest correlation coefficient was reported for the set. P-values were calculated using 100,000 randomly selected gene sets from the 904 genes included in the model. The transcription units were downloaded from EcoCyc (Karp et al. 2002); only the 321 transcription units containing at least two genes with measured expression data were used in the analysis. Average correlation coefficients and P-values for transcription units were calculated as described above.

Acknowledgments

This work was funded by research grants from NIH (GM57089) and NSF (BES-01-20363). The authors and UCSD disclose potential conflicting financial interests.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

Footnotes

[Supplemental material is available online at www.genome.org.]

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2546004.

References

  • Allen, T.E., Herrgard, M.J., Liu, M., Qiu, Y., Glasner, J.D., Blattner, F.R., and Palsson, B.Ø. 2003. Genome-scale analysis of the uses of the Escherichia coli genome: Model-driven analysis of heterogeneous data sets. J. Bacteriol. 185: 6392-6399. [PMC free article] [PubMed]
  • Almaas, E., Kovacs, B., Vicsek, T., Oltvai, Z.N., and Barabasi, A.L. 2004. Global organization of metabolic fluxes in the bacterium Escherichia coli. Nature 427: 839-843. [PubMed]
  • Beard, D.A., Liang, S.D., and Qian, H. 2002. Energy balance for analysis of complex metabolic networks. Biophys. J. 83: 79-86. [PMC free article] [PubMed]
  • Burgard, A.P. and Maranas, C.D. 2001. Probing the performance limits of the Escherichia coli metabolic network subject to gene additions or deletions. Biotechnol. Bioeng. 74: 364-375. [PubMed]
  • Burgard, A.P., Vaidyaraman, S., and Maranas, C.D. 2001. Minimal reaction sets for Escherichia coli metabolism under different growth requirements and uptake environments. Biotechnol. Prog. 17: 791-797. [PubMed]
  • Burgard, A.P., Pharkya, P., and Maranas, C.D. 2003. Optknock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol. Bioeng. 84: 647-657. [PubMed]
  • Burgard, A.P., Nikolaev, E.V., Schilling, C.H., and Maranas, C.D. 2004. Flux coupling analysis of genome-scale metabolic network reconstructions. Genome Res. 14: 301-312. [PMC free article] [PubMed]
  • Covert, M.W., Schilling, C.H., and Palsson, B. 2001. Regulation of gene expression in flux balance models of metabolism. J. Theoret. Biol. 213: 73-88. [PubMed]
  • Covert, M.W., Knight, E.M., Reed, J.L., Herrgard, M.J., and Palsson, B.Ø. 2004. Integrating high-throughput and computational data elucidates bacterial networks. Nature 429: 92-96. [PubMed]
  • Edwards, J.S. and Palsson, B.Ø. 2000. The Escherichia coli MG1655 in silico metabolic genotype: Its definition, characteristics, and capabilities. Proc. Natl. Acad. Sci. 97: 5528-5533. [PMC free article] [PubMed]
  • Fong, S.S., Marciniak, J.Y., and Palsson, B.Ø. 2003. Description and interpretation of adaptive evolution of Escherichia coli K-12 MG1655 using a genome-scale in silico metabolic model. J. Bacteriol. 185: 6400-6408. [PMC free article] [PubMed]
  • Forster, J., Famili, I., Palsson, B.Ø., and Nielsen, J. 2003. Large-scale evaluation of in silico gene knockouts in Saccharomyces cerevisiae. OMICS 7: 193-202. [PubMed]
  • Gennis, R.B. and Stewart, V. 1996. Respiration. In Escherichia coli and salmonella (ed. F.C. Neidhardt), pp. 217-261. ASM Press, Washington, DC.
  • Gerdes, S.Y., Scholle, M.D., Campbell, J.W., Balazsi, G., Ravasz, E., Daugherty, M.D., Somera, A.L., Kyrpides, N.C., Anderson, I., Gelfand, M.S., et al. 2003. Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J. Bacteriol. 185: 5673-5684. [PMC free article] [PubMed]
  • Herrgard, M.J., Covert, M.W., and Palsson, B.Ø. 2003. Reconciling gene expression data with known genome-scale regulatory network structures. Genome Res. 13: 2423-2434. [PMC free article] [PubMed]
  • Ibarra, R.U., Edwards, J.S., and Palsson, B.Ø. 2002. Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420: 186-189. [PubMed]
  • Karp, P.D., Riley, M., Saier, M., Paulsen, I.T., Collado-Vides, J., Paley, S.M., Pellegrini-Toole, A., Bonavides, C., and Gama-Castro, S. 2002. The EcoCyc Database. Nucleic Acids Res. 30: 56-58. [PMC free article] [PubMed]
  • Kauffman, K.J., Prakash, P., and Edwards, J.S. 2003. Advances in flux balance analysis. Curr. Opin. Biotechnol. 14: 491-496. [PubMed]
  • Lee, S., Phalakornkule, C., Domach, M.M., and Grossmann, I.E. 2000. Recursive MILP model for finding all the alternate optima in LP models for metabolic networks. Comp. Chem. Eng. 24: 711-716.
  • Mahadevan, R. and Schilling, C.H. 2003. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab. Eng. 5: 264-276. [PubMed]
  • Papin, J.A., Price, N.D., and Palsson, B.Ø. 2002. Extreme pathway lengths and reaction participation in genome-scale metabolic networks. Genome Res. 12: 1889-1900. [PMC free article] [PubMed]
  • Papin, J.A., Price, N.D., Wiback, S.J., Fell, D.A., and Palsson, B.Ø. 2003. Metabolic pathways in the post-genome era. Trends Biochem. Sci. 28: 250-258. [PubMed]
  • Phalakornkule, C., Lee, S., Zhu, T., Koepsel, R., Ataai, M.M., Grossmann, I.E., and Domach, M.M. 2001. A MILP-based flux alternative generation and NMR experimental design strategy for metabolic engineering. Metab. Eng. 3: 124-137. [PubMed]
  • Price, N.D., Famili, I., Beard, D.A., and Palsson, B.Ø. 2002. Extreme pathways and Kirchhoff's second law. Biophys. J. 83: 2879-2882. [PMC free article] [PubMed]
  • Price, N.D., Papin, J.A., Schilling, C.H., and Palsson, B. 2003. Genome-scale microbial in silico models: The constraints-based approach. Trends Biotechnol. 21: 162-169. [PubMed]
  • Price, N.D., Schellenberger, J., and Palsson, B.Ø. 2004. Uniform sampling of steady state flux spaces: Means to design experiments and to interpret enzymopathies. J. Biol. Chem. (in press). [PMC free article] [PubMed]
  • Raamsdonk, L.M., Teusink, B., Broadhurst, D., Zhang, N., Hayes, A., Walsh, M.C., Berden, J.A., Brindle, K.M., Kell, D.B., Rowland, J.J., et al. 2001. A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nat. Biotechnol. 19: 45-50. [PubMed]
  • Reed, J.L. and Palsson, B.Ø. 2003. Thirteen years of building constraint-based in silico models of Escherichia coli. J. Bacteriol. 185: 2692-2699. [PMC free article] [PubMed]
  • Reed, J.L., Vo, T.D., Schilling, C.H., and Palsson, B.Ø. 2003. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol. 4: R54.51-R54.12. [PMC free article] [PubMed]
  • Salgado, H., Gama-Castro, S., Martinez-Antonio, A., Diaz-Peredo, E., Sanchez-Solano, F., Peralta-Gil, M., Garcia-Alonso, D., Jimenez-Jacinto, V., Santos-Zavaleta, A., Bonavides-Martinez, C., et al. 2004. RegulonDB (version 4.0): Transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12. Nucleic Acids Res. 32: D303-306. [PMC free article] [PubMed]
  • Schilling, C.H., Covert, M.W., Famili, I., Church, G.M., Edwards, J.S., and Palsson, B.Ø. 2002. Genome-scale metabolic model of Helicobacter pylori 26695. J. Bacteriol. 184: 4582-4593. [PMC free article] [PubMed]
  • Schuster, S., Klamt, S., Weckwerth, W., Moldenhauer, F., and Pfeiffer, T. 2002. Use of network analysis of metabolic systems in bioengineering. Bioprocess Biosyst. Eng. 24: 363-372.
  • Segre, D., Vitkup, D., and Church, G.M. 2002. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl. Acad. Sci. 99: 15112-15117. [PMC free article] [PubMed]
  • Stelling, J., Klamt, S., Bettenbrock, K., Schuster, S., and Gilles, E.D. 2002. Metabolic network structure determines key aspects of functionality and regulation. Nature 420: 190-193. [PubMed]
  • Varma, A. and Palsson, B.Ø. 1994. Metabolic flux balancing: Basic concepts, scientific and practical use. Bio/Technology 12: 994-998.
  • Wiback, S.J., Famili, I., Greenberg, H.J., and Palsson, B.Ø. 2004. Monte Carlo sampling can be used to determine the size and shape of the steady state flux space. J. Theor. Biol. 228: 437-447. [PubMed]

WEB SITE REFERENCES


Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...