- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# The *Escherichia coli* MG1655 *in silico* metabolic genotype: Its definition, characteristics, and capabilities

^{*}Present address: Department of Genetics, Harvard Medical School, Boston, MA 02115.

^{†}To whom reprint requests should be addressed. E-mail: ude.dscu@nosslap.

## Abstract

The *Escherichia coli* MG1655 genome has been completely sequenced. The annotated sequence, biochemical information, and other information were used to reconstruct the *E. coli* metabolic map. The stoichiometric coefficients for each metabolic enzyme in the *E. coli* metabolic map were assembled to construct a genome-specific stoichiometric matrix. The *E. coli* stoichiometric matrix was used to define the system's characteristics and the capabilities of *E. coli* metabolism. The effects of gene deletions in the central metabolic pathways on the ability of the *in silico* metabolic network to support growth were assessed, and the *in silico* predictions were compared with experimental observations. It was shown that based on stoichiometric and capacity constraints the *in silico* analysis was able to qualitatively predict the growth potential of mutant strains in 86% of the cases examined. Herein, it is demonstrated that the synthesis of *in silico* metabolic genotypes based on genomic, biochemical, and strain-specific information is possible, and that systems analysis methods are available to analyze and interpret the metabolic phenotype.

**Keywords:**bioinformatics, metabolism, genotype-phenotype relation, flux balance analysis

The complete genome sequence for a number of microorganisms has been established (The Institute for Genomic Research at www.tigr.org). The genome sequencing efforts and the subsequent bioinformatic analyses have defined the molecular “parts catalogue” for a number of living organisms. However, it is evident that cellular functions are multigeneic in nature, thus one must go beyond a molecular parts catalogue to elucidate integrated cellular functions based on the molecular cellular components (1). Therefore, to analyze the properties and the behavior of complex cellular networks, one needs to use methods that focus on the systemic properties of the network. Approaches to analyze, interpret, and ultimately predict cellular behavior based on genomic and biochemical data likely will involve bioinformatics and computational biology and form the basis for subsequent bioengineering analysis.

In moving toward the goal of developing an integrated description of cellular processes, it should be recognized that there exists a history of studying the systemic properties of metabolic networks (2) and many mathematical methods have been developed to carry out such studies. These methods include approaches such as metabolic control analysis (3, 4), flux balance analysis (FBA) (5–7), metabolic pathway analysis (8–11, 69), cybernetic modeling (12), biochemical systems theory (13), temporal decomposition (14), and so on. Although many mathematical methods and approaches have been developed, there are few comprehensive metabolic systems for which detailed kinetic information is available and where such detailed analysis can be carried out (see refs. 15–17 for a few noteworthy exceptions).

To analyze, interpret, and predict cellular behavior, each individual step in a biochemical network must be described, normally with a rate equation that requires a number of kinetic constants. Unfortunately, it currently is not possible to formulate this level of description of cellular processes on a genome scale. The kinetic parameters cannot be estimated from the genome sequence and these parameters are not available in the literature. In the absence of kinetic information, it is, however, still possible to assess the theoretical capabilities of one integrated cellular process, namely metabolism, and examine the feasible metabolic flux distributions under a steady-state assumption. The steady-state analysis is based on the constraints imposed on the metabolic network by the stoichiometry of the metabolic reactions, which basically represent mass balance constraints. The steady-state analysis of metabolic networks based on the mass balance constraints is known as FBA (7, 18, 19). This analysis differs from detailed kinetic modeling of cellular processes, in that it does not attempt to predict the exact behavior of metabolic networks. Rather it uses known constraints on the integrated function of multiple enzymes to separate the states that a system can reach from those that it cannot. Then within the domain of allowable behavior one can study the genotype-phenotype relation, such as the stoichiometric optimal growth performance in a defined environment.

In this manuscript, we have used the biochemical literature, the annotated genome sequence data, and strain-specific information, to formulate an organism scale *in silico* representation of the *Escherichia coli* MG1655 metabolic capabilities. FBA then was used to assess metabolic capabilities subject to these constraints leading to qualitative predictions of growth performance.

## Materials and Methods

### Definition of the *E. coli* MG1655 Metabolic Map.

An *in silico* representation of *E. coli* metabolism has been constructed. We have used the biochemical literature (20), genomic information (21), and the metabolic databases (22–24). Because of the long history of *E. coli* research, there was biochemical or genetic evidence for every metabolic reactions included in the *in silico* representation, and in most cases, there was both genetic and biochemical evidence (Table (Table1).1). The complete list of genes included in the *in silico* analysis is shown in Table Table1,1, and the metabolic reactions catalyzed by these genes can be found on the web (http://gcrg.ucsd.edu/downloads.html). The stoichiometric coefficients for each metabolic reaction within this list were used to form the stoichiometric matrix **S**.

### Determining the Capabilities of the *E. coli* Metabolic Network.

The theoretical metabolic capabilities of *E. coli* were assessed by FBA (5–7). The metabolic capabilities of the *in silico* metabolic genotype were partially defined by mass balance constraints; mathematically represented by a matrix equation:

The matrix **S** is the *mxn* stoichiometric matrix, where *m* is the number of metabolites and *n* is the number of reactions in the network. The *E. coli* stoichiometric matrix was 436 × 720. The vector **v** represents all fluxes in the metabolic network, including the internal fluxes, transport fluxes, and the growth flux. The optimal **v** vector was determined and defined the steady-state metabolic flux distribution.

For the *E. coli* metabolic network, the number of fluxes was greater than the number of mass balance constraints; thus, there was a plurality of feasible flux distributions that satisfied the mass balance constraints (defined in Eq. 1), and the solutions (or feasible metabolic flux distributions) were confined to the nullspace of the matrix **S**.

In addition to the mass balance constraints, we imposed constraints on the magnitude of each individual metabolic flux.

The linear inequality constraints were used to enforce the reversibility/irreversibility of metabolic reactions and the maximal metabolic fluxes in the transport reactions. The intersection of the nullspace and the region defined by the linear inequalities formally defined a region in flux space that we will refer to as the feasible set. The feasible set defined the capabilities of the metabolic network subject to the subset of cellular constraints, and all feasible metabolic flux distributions lie within the feasible set (see Fig. Fig.1).1). However, every vector **v** within the feasible set is not reachable by the cell under a given condition because of other constraints not considered in the analysis (i.e., maximal internal fluxes and gene regulation). The feasible set can be further reduced by imposing additional constraints, and if all of the necessary details to describe metabolic dynamics are known, then the feasible set may reduce to a small region or even a single point (see Fig. Fig.1).1).

For the analysis presented herein, we defined α_{i} = 0 for irreversible internal fluxes, and *α _{i}* = −∞ for reversible internal fluxes. The reversibility of the metabolic reactions was determined from the biochemical literature and is identified for each reaction on the web site. The transport flux for inorganic phosphate, ammonia, carbon dioxide, sulfate, potassium, and sodium was unrestrained (

*α*

_{i}= −∞ and

*β*

_{i}= ∞). The transport flux for the other metabolites, when available in the

*in silico*medium, was constrained between zero and the maximal level (0 <

*v*<

_{i}*v*

_{i}

^{max}). However, when the metabolite was not available in the medium, the transport flux was constrained to zero. The transport flux for metabolites that were capable of leaving the metabolic network (i.e., acetate, ethanol, lactate, succinate, formate, pyruvate, etc.) always was unconstrained in the outward direction.

A particular metabolic flux distribution within the feasible set was found by using linear programming (LP). A commercially available LP package was used (lindo, Lindo Systems, Chicago). LP identified a solution that minimized a particular metabolic objective (subject to the imposed constraints) (5, 25, 26), and was formulated as shown. Minimize −*Z*, where

The vector **c** was used to select a linear combination of metabolic fluxes to include in the objective function (27). Herein, **c** was defined as the unit vector in the direction of the growth flux, and the growth flux was defined in terms of the biosynthetic requirements:

where *d _{m}* is the biomass composition of metabolite

*X*(defined from the literature; ref. 28), and the growth flux is modeled as a single reaction that converts all of the biosynthetic precursors into biomass.

_{m}## Results

FBA was used to examine the change in the metabolic capabilities caused by gene deletions. To simulate a gene deletion, the flux through the corresponding enzymatic reaction was restricted to zero. Genes that code for isozymes or genes that code for components of same enzyme complex were simultaneously removed (i.e., *aceEF, sucCD*). The optimal value of the objective (*Z*_{mutant}) was compared with the “wild-type” objective (*Z*) to determine the systemic effect of the gene deletion. The ratio of optimal growth yields (*Z*_{mutant}*/Z*) was calculated (Fig. (Fig.2).2).

*E. coli*MG1655 central intermediary metabolism; maximal biomass yields on glucose for all possible single gene deletions in the central metabolic pathways. The optimal value of the mutant objective function (

*Z*

_{mutant}) compared with the

**...**

### Gene Deletions.

*E. coli* MG1655 *in silico* was subjected to deletion of each individual gene product in the central metabolic pathways [glycolysis, pentose phosphate pathway (PPP), tricarboxylic acid (TCA) cycle, respiration processes], and the maximal capability of each *in silico* mutant metabolic network to support growth was assessed with FBA. The simulations were performed under an aerobic growth environment on minimal glucose medium.

The results identified the essential (required for growth) central metabolic genes (Fig. (Fig.2).2). For growth on glucose, the essential gene products were involved in the three-carbon stage of glycolysis, three reactions of the TCA cycle, and several points within the PPP. The remainder of the central metabolic genes could be removed and *E. coli in silico* maintained the potential to support cellular growth. This result was related to the interconnectivity of the metabolic reactions. The *in silico* gene deletion results suggest that a large number of the central metabolic genes can be removed without eliminating the capability of the metabolic network to support growth under the conditions considered.

### Are the *in Silico* Redundancy Results Consistent with Mutant Data?

The *in silico* gene deletion study results were compared with growth data from known mutants. The growth characteristics of a series of *E. coli* mutants on several different carbon sources were examined and compared with the *in silico* deletion results (Table (Table2).2). From this analysis, 86% (68 of 79 cases) of the *in silico* predictions were consistent with the experimental observations.

### How Are Cellular Fluxes Redistributed?

The potential of many *in silico* deletion strains to support growth led to questions regarding how the *E. coli* metabolic genotype deals with the loss of metabolic functions. The answer involves the degree of stoichiometric connectivity of key metabolites. For illustration, the flux redistributions to optimally support growth of a single mutant and a double mutant were investigated.

The optimal metabolic flux distribution for the *in silico* wild type was calculated (Fig. (Fig.3).3). The constraints used in the LP problem are defined in the figure legend. The *in silico* results suggest that optimally the oxidative branch of the PPP was used to generate a large fraction of the NADPH (66% *in silico*: 20–50% reported in the literature, ref. 29), and the TCA cycle produced NADH. The optimal flux distribution also suggested that the majority of the high-energy phosphate bonds were generated via oxidative phosphorylation and acetate secretion because of limitations of the oxygen supply.

*zwf*

^{-}mutant. Biomass yield is 99% of the results for the full metabolic genotype. (Blue)

*zwf*

^{-}

*pnt*

^{-}mutant. Biomass yield is 92% of the results for

**...**

The *in silico* gene deletion results predicted that the optimal biomass yield of the *zwf*^{-} (glucose-6-phosphate dehydrogenase) *in silico* strain was slightly less than the wild type. The optimal flux distribution of the *zwf*^{-}* in silico* strain (Fig. (Fig.2)2) was calculated, and the NADPH was optimally generated through the transhydrogenase reaction and an elevated TCA cycle flux. The PPP biosynthetic precursors were generated in the nonoxidative branch. This metabolic flux rerouting resulted in an optimal biomass yield that was 99% of the *in silico* wild type.

The transhydrogenase (*pnt*) also was deleted *in silico*, creating an *in silico* double deletion mutant and eliminating an alternate source of NADPH. The double mutant still maintained growth potential. The optimal flux distribution (Fig. (Fig.2)2) used the isocitrate dehydrogenase and the malic enzyme to produce NADPH. The optimal biomass yield of the double mutant was 92% of the *in silico* wild type. The FBA results were consistent with the experimental observations that the *zwf*^{-} strain (30) and the *pnt*^{-} strain (29) are able to grow at near wild-type yields. Furthermore, the *zwf*^{-}* pnt*^{-} double mutant strain also has been shown to grow (*μ*_{mutant}*/μ*_{wild type} = 57%) (29).

## Discussion

Extensive information about the molecular composition and function of several single-cellular organisms has become available. A next important step will be to incorporate the available information to generate whole-cell models with interpretative and predictive capability. Herein, we have taken a step in that direction by using a set of constraints on cellular metabolism on the whole-cell level to analyze the metabolic capabilities of the extensively studied bacterium *E. coli*. We have calculated the optimal metabolic network utilization with a FBA. The *in silico* results, based only on stoichiometric and capacity constraints, were consistent with experimental data for the wild type and many of the mutant strains examined.

The construction of comprehensive *in silico* metabolic maps provided a framework to study the consequences of alterations in the genotype and to gain insight into the genotype-phenotype relation. The stoichiometric matrix and FBA were used to analyze the consequences of the loss of a gene product function on the metabolic capabilities of *E. coli*. The results demonstrated an important property of the *E. coli* metabolic network, namely that there are relatively few critical gene products in central metabolism. The nonessential genes in several organisms have been found experimentally on a genome scale (31, 32), which opens up the opportunity to critically test the *in silico* predictions. The *in silico* analysis also suggests that although the ability to grow in one defined environment is only slightly altered the ability to adjust to different environments may be diminished (33). Therefore, the *in silico* analysis provides a methodology for relating the specific biochemical function of the metabolic enzymes to the integrated properties of the metabolic network.

The *in silico* analysis presented herein is not the typical metabolic modeling; more appropriately, the analysis can be thought of as a constraining approach. This approach defines the “best” the cell can do and identifies what the cell cannot do, rather than attempting to predict how the cell actually will behave under a given set of conditions. To accomplish this, we have used a set of physicochemical constraints for which there is reliable information available, in particular the stoichiometric properties. FBA does not directly consider regulation or the regulatory constraints on the metabolic network.

The results of FBA can be interpreted in a qualitative or a quantitative sense. At the first level we can ask whether a cell is able to grow under given circumstances and how a loss of the function of a gene product influences this ability. The results presented herein fall into this category. Quantitative predictions would hold true if the cell optimized its growth under the growth conditions considered. Therefore, when applying LP to predict quantitatively the optimal metabolic pathway utilization, it is assumed that the cell has found an “optimal solution” for survival through natural selection, and we have equated survival with growth. Although *E. coli* may grow optimally in defined media, one should not expect that optimizing growth is the governing objective of the cell under all growth conditions. For example, the regulatory mechanisms can only evolve to stoichiometric optimality in a condition to which the cell has been exposed. Furthermore, the growth behavior of mutant strains is unlikely to be optimal. However, FBA can still be used to delineate the metabolic capabilities of mutant cells based on constraining features, because both wild-type and mutant cells must obey the physicochemical constraints imposed.

The constraints on the system accurately reflect the steady-state capabilities of the metabolic network, but does the calculated optimal flux vector in the feasible set accurately reflect the behavior of the actual metabolic network? It has been shown that in a minimal media the metabolic behavior of wild-type *E. coli* is consistent with stoichiometric optimality (34). Furthermore, more detailed and critical experimental results are consistent with the hypothesis that *E. coli* does optimize its growth in acetate or succinate minimal media (33). Taken together these results call for critical experimental investigation to evaluate the hypothesis that stoichiometric and capacity constraints are the principal constraints that limit *E. coli* maximal growth. Even though growth and metabolic behavior in minimal media are consistent with FBA results, one still must determine the generality of optimal performance. The call for critical experimentation is particularly timely, given the increasing number of genome scale measurements that are now possible through two-dimensional gels (35, 36) and DNA array technology (37, 38). Furthermore, the ability to precisely remove ORFs can be used to design critical experiments (39). The *in silico* model can be used to choose the most informative knockouts and to design growth experiments with the knockouts.

At the present time, the annotation of the *E. coli* genome is incomplete, and about one-third of its ORFs do not have a functional assignment. Thus, the metabolic genotype studied here may lack some metabolic capabilities that *E. coli* possesses. The biochemical literature also was used to define the *in silico* metabolic genotype, and given the long history of *E. coli* metabolic research (20), a large percentage of the *E. coli* metabolic capabilities likely have been identified. However, if additional metabolic capabilities are discovered (40), the *E. coli* stoichiometric matrix can be updated, leading to an iterative model building process. Additionally, the *in silico* analysis can help identify missing or incorrect functional assignments by identifying sets of metabolic reactions that are not connected to the metabolic network by the mass balance constraints.

The ability to analyze, interpret, and ultimately predict cellular behavior has been a long sought-after goal. The genome sequencing projects are defining the molecular components within the cell, and describing the integrated function of these molecular components will be a challenging task. The results presented herein suggest that it may be possible to analyze cellular metabolism based on a subset of the constraining features. Continued prediction and experimental verification will be an integral part in the further development of *in silico* strains. Deciphering the complex relation between the genotype and the phenotype will involve the biological sciences, computer science, and quantitative analysis, all of which must be included in the bioengineering of the 21st century.

## Acknowledgments

We thank Ramprasad Ramakrishna, George Church, and Christophe Schilling for critical advice and input. National Institutes of Health Grant GM 57089 and National Science Foundation Grant MCB 9873384 supported this research.

## Abbreviations

- FBA
- flux balance analysis
- LP
- linear programming
- TCA
- tricarboxylic acid
- PPP
- pentose phosphate pathway

## References

**National Academy of Sciences**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (348K)

- In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data.[Nat Biotechnol. 2001]
*Edwards JS, Ibarra RU, Palsson BO.**Nat Biotechnol. 2001 Feb; 19(2):125-30.* - Metabolic flux balance analysis and the in silico analysis of Escherichia coli K-12 gene deletions.[BMC Bioinformatics. 2000]
*Edwards JS, Palsson BO.**BMC Bioinformatics. 2000; 1:1. Epub 2000 Jul 27.* - Toward metabolic phenomics: analysis of genomic data using flux balances.[Biotechnol Prog. 1999]
*Schilling CH, Edwards JS, Palsson BO.**Biotechnol Prog. 1999 May-Jun; 15(3):288-95.* - Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods.[Nat Rev Microbiol. 2012]
*Lewis NE, Nagarajan H, Palsson BO.**Nat Rev Microbiol. 2012 Feb 27; 10(4):291-305. Epub 2012 Feb 27.* - Metabolic network reconstruction: advances in in silico interpretation of analytical information.[Curr Opin Biotechnol. 2012]
*Chen N, del Val IJ, Kyriakopoulos S, Polizzi KM, Kontoravdi C.**Curr Opin Biotechnol. 2012 Feb; 23(1):77-82. Epub 2011 Nov 24.*

- A genome-scale metabolic flux model of Escherichia coli K-12 derived from the EcoCyc database[BMC Systems Biology. ]
*Weaver DS, Keseler IM, Mackie A, Paulsen IT, Karp PD.**BMC Systems Biology. 879* - Source and regulation of flux variability in Escherichia coli[BMC Systems Biology. ]
*San Román M, Cancela H, Acerenza L.**BMC Systems Biology. 867* - An efficient graph theory based method to identify every minimal reaction set in a metabolic network[BMC Systems Biology. ]
*Jonnalagadda S, Srinivasan R.**BMC Systems Biology. 828* - Genome-based Modeling and Design of Metabolic Interactions in Microbial Communities[Computational and Structural Biotechnology ...]
*Mahadevan R, Henson MA.**Computational and Structural Biotechnology Journal. 3e201210008* - Exploring metabolism flexibility in complex organisms through quantitative study of precursor sets for system outputs[BMC Systems Biology. ]
*Abdou-Arbi O, Lemosquet S, Milgen JV, Siegel A, Bourdon J.**BMC Systems Biology. 88*

- Cited in BooksCited in BooksPubMed Central articles cited in books
- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- The Escherichia coli MG1655 in silico metabolic genotype: Its definition, charac...The Escherichia coli MG1655 in silico metabolic genotype: Its definition, characteristics, and capabilitiesProceedings of the National Academy of Sciences of the United States of America. May 9, 2000; 97(10)5528PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...