![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||
Copyright © 2008 Durot et al; licensee BioMed Central Ltd. Iterative reconstruction of a global metabolic model of Acinetobacter baylyi ADP1 using high-throughput growth phenotype and gene essentiality data 1Genoscope (Commissariat à l'Energie Atomique) and UMR 8030 CNRS-Genoscope-Université d'Evry, 2 rue Gaston Crémieux, CP5706, 91057 Evry, Cedex, France Corresponding author.Maxime Durot: mdurot/at/genoscope.cns.fr; François Le Fèvre: flefevre/at/genoscope.cns.fr; Véronique de Berardinis: vberard/at/genoscope.cns.fr; Annett Kreimeyer: akreimey/at/genoscope.cns.fr; David Vallenet: vallenet/at/genoscope.cns.fr; Cyril Combe: ccombe/at/genoscope.cns.fr; Serge Smidtas: smidtas/at/genoscope.cns.fr; Marcel Salanoubat: salanou/at/genoscope.cns.fr; Jean Weissenbach: jsbach/at/genoscope.cns.fr; Vincent Schachter: vs/at/genoscope.cns.fr Received April 23, 2008; Accepted October 7, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background Genome-scale metabolic models are powerful tools to study global properties of metabolic networks. They provide a way to integrate various types of biological information in a single framework, providing a structured representation of available knowledge on the metabolism of the respective species. Results We reconstructed a constraint-based metabolic model of Acinetobacter baylyi ADP1, a soil bacterium of interest for environmental and biotechnological applications with large-spectrum biodegradation capabilities. Following initial reconstruction from genome annotation and the literature, we iteratively refined the model by comparing its predictions with the results of large-scale experiments: (1) high-throughput growth phenotypes of the wild-type strain on 190 distinct environments, (2) genome-wide gene essentialities from a knockout mutant library, and (3) large-scale growth phenotypes of all mutant strains on 8 minimal media. Out of 1412 predictions, 1262 were initially consistent with our experimental observations. Inconsistencies were systematically examined, leading in 65 cases to model corrections. The predictions of the final version of the model, which included three rounds of refinements, are consistent with the experimental results for (1) 91% of the wild-type growth phenotypes, (2) 94% of the gene essentiality results, and (3) 94% of the mutant growth phenotypes. To facilitate the exploitation of the metabolic model, we provide a web interface allowing online predictions and visualization of results on metabolic maps. Conclusion The iterative reconstruction procedure led to significant model improvements, showing that genome-wide mutant phenotypes on several media can significantly facilitate the transition from genome annotation to a high-quality model. Background The diversity of bacterial metabolism and the perspective of engineering applications has spurred a steep increase in both the number of sequencing projects and the volume of high throughput experiments on bacteria. The need to interpret and integrate these datasets at the systems level has triggered the development of model-based computational methods [1]. Among them, the constraint-based modeling approach (CBM) has proved to be particularly efficient at integrating large-scale omics datasets related to metabolism, such as growth phenotypes, metabolite concentrations, or reaction fluxes [2]. In addition to providing a structured summary of metabolism-related knowledge for a given species, a constraint-based model allows the prediction and analysis of a variety of properties resulting from topological, stoichiometric, and physiological constraints known to apply at steady-state to its global metabolic network. Applications range from studies on evolutionary or physiological properties to the design of metabolic engineering strategies for biotechnological or therapeutical purposes [3]. Nearly twenty such models have been built so far [2], typically through extensive curation work, and, for some of them, through iterative refinement processes where models were progressively improved by comparison with experimental datasets [4]. Systematic evaluation of gene essentiality has proved to be a valuable resource for investigating gene functions; knockout mutant collections have been recently built in this aim for a number of bacteria [5-8]. Rigorous analysis of their results remains a challenging task, however, as gene essentiality depends on the environmental condition and the link between genes and essential functions may be blurred by genetic or metabolic redundancy [9,10]. Genome-scale metabolic models provide a valuable framework to help interpret essentiality screens, since they both recapitulate knowledge on metabolic networks and allow prediction of gene essentiality under well-defined conditions. They have also allowed meaningful cross-validation of reconstructed metabolic networks with sets of gene essentiality results, providing insights on potential erroneous or incomplete metabolic knowledge, and on possible improvements [4,11,12]. In this article, we systematically exploit inconsistencies between model predictions and experimental results to improve a metabolic model reconstruction. Our focus is on Acinetobacter baylyi ADP1, a strictly aerobic γ-proteobacterium. Although phylogenetically close to the Acinetobacter baumanii pathogenic strains, responsible for a growing number of nosocomial infections [13], A. baylyi ADP1 is an innocuous soil bacterium. Because of its metabolic versatility and high competency for natural genetic transformation, it is a model organism of choice for genetic and metabolic investigations [14-16]. As a soil bacterium, A. baylyi is able to degrade a wide range of molecules, including components of suberin, a protective polymer produced by plants in response to stress. Its harmlessness, nutritional versatility, and high capacity for adaptation have led bacteria of the Acinetobacter genus to be used for a variety of biotechnological applications–including the degradation of pollutants (e.g. biphenyl, phenol, benzoate, crude oil, nitriles) and the production of valuable biochemical products such as lipases, proteases, bioemulsifiers, cyanophycine and different kinds of biopolymers [17,18]. Following its sequencing and expert annotation [19], a genome-wide single-knockout mutant library was generated (ADP1 mutant collection [8]), enabling the high-throughput assessment of mutant phenotypes in defined growth conditions. We report below on the reconstruction and refinement of a genome-scale metabolic model for A. baylyi with the help of high-throughput experimental data. Following an initial reconstruction using metabolic information extracted from the genome annotation and the literature, the model was iteratively assessed and improved by comparing its predictions with (1) large-scale growth phenotyping results of the wild-type strain on 190 distinct environments, (2) genome-wide gene essentiality data from the mutant collection, and (3) conditional gene essentiality data derived from growth phenotyping of A. baylyi mutants on eight defined media. We examined each inconsistency between experimental results and model predictions, and corrected the model when sufficient justifying evidence could be collected. Combining the three refinement steps, 1262 out of 1412 predictions were initially consistent with experimental results. Among the inconsistent cases, 65 led to improvements, increasing the completeness and accuracy of the model. The final version of the model, called iAbaylyiv4, predicted accurately (1) 91% of the wild-type growth phenotypes, (2) 94% of the genome-wide gene essentialities, and (3) 94% of the phenotypic profiles of A. baylyi mutants on the tested media. We developed a web interface which provides easy access to both model and experimental data. The interface allows browsing of the metabolic network, online computation of phenotype predictions, and comparison of predictions with experimental results [20]. Results and discussion Initial model reconstruction The genome scale model of A. baylyi was iteratively reconstructed following a process depicted in Figure Figure1.1
This initial reconstruction process led to the model iAbaylyiv1 gathering 859 reactions grouped in 7 metabolic categories and 697 distinct metabolites, 109 of which could be transported from the environment. As depicted in Figure Figure3,3
iAbaylyiv1 involves 787 genes out of the 1518 confirmed or putative enzymatic and transport genes of A. baylyi. A large majority (94%, 681/726) of the enzymatic reactions (excluding transporters) were associated with at least one gene, while the lower proportion (83%, 110/133) of transport reactions linked to genes is explained by the extensive use of physiological data to include them. The association of nearly all reactions with a gene confers a high reliability to the model. The few reactions that were introduced with no associated gene are most often supported by indirect evidence and introduced in order to fill gaps (See Additional file 2). Most A. baylyi genes were annotated by expert curation; a third of the model genes relied on evidence conferring them a medium confidence level, e.g. limited homology with genes of known function, or conservation of amino acid motifs (Figure (Figure4).4
Model validation and expansion using growth phenotype results We used results of large-scale growth phenotyping experiments to perform a first round of model assessment and refinement. Using Biolog assays, we experimentally tested the wild-type strain ability to use 190 distinct metabolites as sole carbon and energy sources (see Methods). Using the model, we predicted the growth phenotypes of the wild-type strain on the corresponding in silico media and compared them to the experimental results. Out of the 190 screened metabolites, 45 were found to be carbon and energy sources for A. baylyi. This relatively small fraction of carbon sources can be explained by the fact that Biolog microplates are only partially adapted to A. baylyi's biotope: they feature sugars, nucleosides or amino acids but relatively few chemicals originating from plant compounds. iAbaylyiv1 model predicted 24 of them and missed 21 (see Figure Figure1).1
Thirteen carbon source metabolites were unknown to the metabolic model. For two of them, sorbate and tricarballylate, we were able to identify degradation pathways and add them to the model (see Table 2). Sorbate, an unsaturated fatty acid, can be degraded by fatty acids oxidation enzymes, which were already included in the model for the degradation of other fatty acids. Sorbate transport and degradation reactions were therefore added to the model using the same set of genes. Recently, genes coding for tricarballylate transport (tcuC), oxidation to cis-aconitate (tcuA and tcuB), and for a regulatory protein required for tcuABC expression (tcuR) were identified in Salmonella enterica [28,29]. Highly homologous genes could be found in synteny in A. baylyi: ACIAD1536 (tcuB, 59% identity), ACIAD1537 (tcuA, 76% identity), ACIAD1541 (tcuC, 64% identity), ACIAD1539 (tcuR, 46% identity), and ACIAD1543 (tcuR, 44% identity). Following these clues, we expanded the model by implementing the corresponding transporter and degradation reaction, and annotated the corresponding genes. In four cases, dedicated growth experiments contradicted the Biolog result, weakening the case for further study (see Table 2). Finally, no relevant pathway could be found for the remaining seven unmodeled carbon sources. Further investigations are needed to identify the metabolic processes allowing A. baylyi to exploit these metabolites. Conversely, only five of the 145 non-carbon source metabolites were wrongly predicted to be carbon sources by the model: 4-hydroxybenzoate, D-fructose, L-arginine, L-ornithine, and D-serine (see Figure Figure11 Interestingly, A. baylyi annotation describes a complete phosphotransferase (PTS) transport system for fructose (ACIAD1990 and ACIAD1993, fruA &fruB) coupled with a 1-phosphofructokinase (ACIAD1992, fruK) leading to fructose-1,6-bisphosphate (see Figure Figure5).5
As is the case in E. coli, L-ornithine and L-arginine are degraded by A. baylyi using the arginine succinyltransferase (AST) pathway. This pathway allows E. coli to use them as nitrogen sources, but not as carbon sources. Putative explanations include unsuitable regulation and inadequate transport [31]. Similar reasons may explain A. baylyi's inability to use L-ornithine and L-arginine as carbon sources. A. baylyi's genome annotation includes genes for D-serine transport (ACIAD0118 and ACIAD2662, cycA) and D-serine deaminase activity (ACIAD1048 dsdA), which should allow it to use D-serine as a carbon and nitrogen source. The interpretation of this inconsistency is also unclear; a similar unexplained inconsistency was pointed out in a study involving a metabolic model of B. subtilis [4]. Improvements to the model resulted in iAbaylyiv2, raising predictive accuracy on Biolog-measured phenotypes from 86% to 91% of the growth phenotypes (see Figure Figure1).1 Systematic model improvement using gene essentiality data In steps 2 and 3 of the model refinement process, we assessed and improved the model by comparing its predictions to experimentally determined gene essentialities, derived from the ADP1 mutant collection [8] (see Figure Figure1).1 We classified refinements into three categories according to the model component that was modified: GPR, NETWORK or BIOMASS (see Figure Figure2).2 Model refinements We performed two iterations of refinement using gene essentiality data (see Figure Figure1).1
It is worth noticing that inconsistency results support our choice to include medium-confidence genes into the model. Genes associated with medium-confidence metabolic annotations did not trigger more inconsistencies then high-confidence level genes. 18% (47/268) of reactions including at least one medium-confidence gene in their GPR are associated with an inconsistent gene, a similar proportion to that of reactions containing only high confidence genes (14%, 75/527). We examined the 91 inconsistent predictions of this step and refined the model for 47 of them (see Table 3 and below for details on the corrections). The refinements were implemented in iAbaylyiv3, increasing global accuracy from 88% to 94%. Improvement was most noticeable for essential genes, as 86% were correctly predicted by iAbaylyiv3. As discussed below, a high number of false isozymes, triggering false dispensable predictions, were detected in this refinement step.
In a second step, the model was evaluated against growth phenotyping assays of mutants from the ADP1 collection on 8 minimal media supplemented with varying carbon and nitrogen sources (see Table 4 and Methods). Since all A. baylyi mutants were first obtained on a succinate-supplemented minimal medium, essentialities revealed by these assays were strictly conditional. Furthermore, as the succinate-supplemented medium was already minimal, the set of conditionally essential genes was restricted to the genes directly related to the use of the tested carbon and nitrogen sources. These were chosen to involve different parts of A. baylyi secondary metabolism (see Table 4). Overall, 455 knockout mutants corresponding to genes in the model could be phenotyped (see Figure Figure11
Phenotyping experiments pointed out 2 to 10 conditionally essential genes (from the set of model genes) on each medium (Table 4). While a majority of these genes were essential on a single medium, some were found conditionally essential on several media. This revealed interdependencies between environments and might be related to processes specific to groups of environments. For instance, growth phenotypes on 2,3-butanediol and acetate exhibit similar characteristics since 2,3-butanediol is converted to acetate for its utilization [8]. The use of acetate as a carbon source requires the activation of the glyoxylate shunt, catalyzed by ACIAD1084 (isocitrate lyase) and ACIAD2335 (malate synthase G). These genes were therefore found to be essential on 2,3-butanediol and acetate only. Accordingly, the metabolic model correctly predicted the required use of this pathway and the subsequent essentiality of these genes on these media. As shown in Figure Figure1,1 Combining both refinement steps, 56 out of 124 inconsistencies led to model corrections. In the following sections, we will discuss these gene essentiality inconsistencies in more details irrespective to the dataset that triggered them (see also Table 3). Model corrections will be presented according to the model component that was modified. GPR corrections A majority of the model improvements (34/56) were applied to the GPR component, with a clear bias towards false dispensable inconsistencies: 26 GPR corrections pertained to experimentally essential genes against only 8 to experimentally dispensable genes (see Table 3). This large set of false dispensable predictions includes two main inconsistency types. In 22 cases, isofunctional genes with annotations of medium confidence were in fact unable to replace the activity of their deleted isozymes. For instance, ACIAD0964 and ACIAD2907 (prs) were identified in the initial reconstruction as isozymes for the catalysis of the ribose-phosphate diphosphokinase activity, which is required for the biosynthesis of 5-phosphoribosylpyrophosphate (PRPP) (see Figure Figure8A).8A
Further examination revealed that the duplicate genes are also found together in other organisms, including Bradyrhizobium japonicum and Bordetella bronchiseptica, and that S. cerevisiae possesses the gene ILV3, with a confirmed activity [34], which is homologous to ACIAD3636 (51% identity). Overall, amongst the reactions which were essential to iAbaylyiv2 viability and associated with an isozyme of medium confidence-level, 8 showed agreement between predictions and phenotypes while 11 triggered inconsistencies. In other words, while some medium-level genes were discarded thanks to essentiality data, a comparable fraction of genes was indirectly confirmed. This observation provides additional confirmation that essentiality data represents a valuable resource, as it helps validate or discard gene functions supported by reasonably good but non-conclusive evidence. It also provides an a posteriori validation of the usefulness of including medium-level annotations in the initial model, as failing to do so would have resulted in a significant loss of information in the A. baylyi metabolic model. For three false dispensable predictions, we uncovered enzymatic complexes or functional dependencies between genes that were absent from the initial reconstruction: genes thought to be isozymes were in fact jointly required to catalyze the reactions. As an illustration, ACIAD0661(hisG) and ACIAD1257 (hisZ) were initially assigned as isozymes of ATP phosphoribosyltransferase reaction in the pathway of histidine biosynthesis (see Figure Figure8A).8A Amongst the false essential predictions which led to modifications of the GPR component, six cases involved associating additional enzymes to reactions. For instance, ACIAD2968 (ispA, farnesyl diphosphate synthase) was observed to be dispensable, even though it is the only catalyst of two reactions essential for the biosynthesis of isoprenoids, which are the precursors of vital cofactors (see Figure Figure8B).8B The remaining types of GPR refinement involved associating genes with already existing essential reactions (ACIAD2606: associated with nicotinate-nucleotide adenylyltransferase activity, which is essential for NAD biosynthesis), adding new complex subunits (ACIAD0799: falsely considered as a sulfite reductase subunit and replaced by ACIAD2981 after further investigations) or assigning spontaneous activity (ACIAD2819: encodes for gluconolactonase activity which has been shown to occur spontaneously [37]). See Additional file 3 for further details on these corrections. NETWORK corrections Twelve gene essentiality inconsistencies from datasets 2 and 3 led us to improve the NETWORK component of the model (see Table 3). Two types of inconsistencies fall within this category. On the one hand, false dispensable predictions may indicate that alternate pathways present in the model are either inactive for the experimental conditions under observation or not present at all. Seven discrepant predictions led us to reconsider alternate pathways in the model. For instance, ACIAD0822, ACIAD0823, and ACIAD0824 (gatABC), annotated as aspartyl/glutamyl-tRNA amidotransferase, catalyzed in iAbaylyiv2 the synthesis of charged glutamine-tRNA and charged asparagine-tRNA through the transamidation of misacylated glutamate-tRNA(Gln) and aspartate-tRNA(Asn) (see Figure Figure8C).8C On the other hand, false essential predictions may suggest that alternate pathways are missing from the model. Corrections of this type involve searching for new metabolic activities, a task that is open-ended and exploratory in nature and is likely to require additional experimental work. Five inconsistencies led to the addition of new reactions to the model, mainly for the transport of metabolites. BIOMASS corrections Ten inconsistent gene essentiality predictions led to modifications of the BIOMASS component (see Table 3). False essential inconsistencies can reveal biomass precursors that are not necessary to the viability of the cell on the tested environments, yet commonly produced by the wild-type strain. For instance, a large fraction of the BIOMASS modifications (8/10) were found in the biosynthesis of polysaccharides. Based on studies of the lipopolysaccharides composition of Acinetobacter species [38,39], three nucleotide sugars were initially included in the list of essential biomass precursors. All genes specifically involved in the synthesis of these sugars were found to be dispensable for growth on these in vitro environments (see Figure Figure8D).8D Conversely, false dispensable inconsistencies may uncover essential metabolites that were initially overlooked. For instance, undecaprenyl diphosphate, a cofactor required for the synthesis of peptidoglycan, was not part of the biomass precursors list in iAbaylyiv2. ACIAD1374 (ispU, undecaprenyl pyrophosphate synthetase), involved in its synthesis, was observed essential, although predicted dispensable (see Figure Figure8B).8B Interpretation of remaining inconsistencies The analysis of inconsistent predictions did not always lead to model refinement. Either the explanation of the discrepancy did not lead to model refinement, or no explanation interpreting the discrepancy could be validated. Six discrepancies were confidently interpreted yet did not lead to model modifications (see Table 3). In one case, we identified a wrong experimental result. Four inconsistencies pertained to the pathway of biotin synthesis, whose essentiality could not be accounted for by the model. Since the initial step of this pathway is unknown, it could not be linked to the metabolic network, preventing the model from simulating biotin synthesis. One inconsistency was caused by a requirement for a cofactor that could not be modeled. Two different methionine synthase enzymes catalyze the conversion of homocysteine to methionine: one B12-independent encoded by ACIAD3523 (metE) and one B12-dependent encoded by ACIAD1045 (metH). Since coenzyme-B12 is neither synthesized by A. baylyi nor provided in the experimental media, the ΔACIAD3523 mutant was unable to use the MetH enzyme to synthesize methionine. The model could not account for this B12 auxotrophy of the ΔACIAD3523 mutant. In order to properly account for the dependency between MetH activity and the presence of a cofactor, the replenishing flux method can be employed [27] or the modeling framework could be extended by introducing rules that state which conditions are required for the enzymes to be active. The introduction of this additional layer of rules has already been proposed to account for regulatory constraints [41] and may be helpful to explain a number of inconsistent phenotypes. For 62 inconsistencies, we could not reach a validated explanation within the scope of this global analysis (see Table 3). For 32 of them, we could formulate hypothetical interpretations, all of which need experimental confirmation. A high proportion of these possible interpretations involve regulatory processes. For instance, A. baylyi possesses like E. coli two distinct enzymes for glutamate synthesis: glutamate synthase, encoded by ACIAD3350 (gltB) and ACIAD3349(gltD), and glutamate dehydrogenase, encoded by ACIAD1110 (gdhA). In E. coli, these pathways were shown to be regulated in response to nitrogen limitations [42]: glutamate synthase is used at low ammonium concentrations while glutamate dehydrogenase is used at high ammonium concentrations. E. coli strains lacking glutamate synthase show severe growth deficiency at low ammonium concentrations [42]. Similarly, ACIAD3350 and ACIAD3349 were found essential in A. baylyi on the succinate-supplemented minimal medium. These phenotypes contradicted model predictions, which considered the alternate pathway for glutamate synthesis. Further investigation would be required to fully understand the regulatory processes at work in this pathway for A. baylyi and extension of the modeling framework should be conducted to account for regulatory processes within the model. The remaining 30 inconsistencies could not be given a clear interpretation and also require further investigations. The final model: iAbaylyiv4 The overall refinement process led to the final model iAbaylyiv4 gathering 774 genes, 875 reactions and 701 metabolites (see Figure Figure1).1 An online software tool for the exploration of Acinetobacter baylyi metabolism In order to facilitate the exploration of A. baylyi metabolism using the genome scale model, we created NemoStudio [20] (Combe et al, in preparation), a web interface combining a simulation layer for the model with AcinetoCyc, A. baylyi Pathway-Genome Database [21]. NemoStudio gathers data on functional genomics annotations, metabolic reactions and pathways, and experimental mutant phenotyping results within a single interface. Additionally, it allows performing phenotype predictions using the constraint-based model. AcinetoCyc gathers information on the metabolic network of A. baylyi and is used to display interactive metabolic maps. After its initial automated construction using PathoLogic [21], AcinetoCyc has been undergoing constant curation. It includes all metabolic reactions present in the model. NemoStudio integrates the latest version of A. baylyi metabolic model, iAbaylyiv4. Growth phenotype predictions can be performed for any set of environmental conditions and genetic perturbations of this study. We implemented both Flux Balance Analysis (FBA) and Metabolite Producibility methods to predict growth phenotypes (see Methods). When performed on sets of environmental conditions and sets of gene deletions, prediction results are displayed in a table format in parallel to the actual experimental results. Predictions can thus be readily compared with the experimental observations. Furthermore, predicted and experimental phenotypes are both displayed on AcinetoCyc metabolic maps, and conversely gene deletions can be directly set from these metabolic maps (see Figure Figure9).9
The availability of this resource as a web interface makes it easily usable by scientists interested in A. baylyi metabolism. Compared with previous web-based software for genome-scale metabolic modeling [27], the A. baylyi NemoStudio interface provides better interactivity, direct visualization of results on metabolic maps and integrated comparison with experimental data. By interfacing as much as possible results deriving from systems level analyses with experimental data of various forms, it allows the simultaneous exploitation of both information types. Conclusion In this work, we reconstructed a genome-scale model of Acinetobacter baylyi metabolism from the annotation of its genome, metabolic knowledge reported in the literature, and results of high-throughput experiments. The model provides a curated and structured representation of this species's metabolism for use both as a reference and as a foundation for further study. The reconstruction accounts for 875 reactions, 701 distinct metabolites, and 774 genes, and includes nearly all metabolic routes and biochemical conversions identified for A. baylyi. A significant proportion of reactions belong to pathways of secondary metabolism that are characteristic of A. baylyi's physiology and lifestyle. The model thus reflects the specific ability of A. baylyi to utilize various chemicals originating from plant metabolism, e.g. aromatic acids, hydroxylated aromatic acids, or straight chain dicarboxylic acids. It may assist or even drive future investigations on this bacterium, helping for instance interpret other types of experimental data beyond growth phenotypes, or engineer its metabolism. An increasing number of metabolic engineering strategies are being designed with the help of genome-scale metabolic model predictions [43,44]: the availability of the A. baylyi model should facilitate efforts towards biotechnology goals. The A. baylyi model may also serve as a basis for the reconstruction of metabolic models of the pathogen strains Acinetobacter baumanii. These strains, which are involved in serious nosocomial infections worldwide and have acquired multidrug-resistance capabilities[13], share a significant number of metabolic genes with A. baylyi [45]. This model is also the fourth genome-scale bacterial metabolic model to be accompanied by an exhaustive mutant library (with E. coli [5,12], Bacillus subtilis [4,6], and Pseudomonas aeruginosa PAO1 [46,47]). The proximity between A. baylyi and P. aeruginosa, and to a lesser extent E. coli, and the availability of model/mutant library pairs provides an invaluable setup for comparing the metabolism of different species [8]. Several rounds of comparisons of model predictions to large-scale experimental results led to significant model improvements. First, growth phenotypes of the wild-type strain on 190 distinct environments resulted in the addition of 9 transporters and 2 pathways to the model. After improvement, the model accounted correctly for the growth phenotypes on 173 of the 190 environments. Secondly, we assessed the model against gene essentiality results on 9 defined environments. In contrast with wild-type growth phenotypes, these data can bring indirect information on the gene functions or on the existence of alternate pathways. Investigation on the causes of inconsistencies led us to modify the model in 56 cases out of 124 inconsistent predictions. All model components were modified, the GPR component gathering most of the improvements. The model accuracy in predicting mutant growth phenotypes increased from 88% to 94% on succinate-supplemented minimal medium and from 93% to 94% for the combined conditional gene essentiality results on 8 media. High-throughput phenotype clearly improved the quality of the model and expanded our understanding of A. baylyi metabolism, providing a valuable complement to the annotation and the literature. The refinement process was particularly useful in validating or contradicting functional annotations that stood in the "grey zone", i.e. for which the annotation process provided only medium-level evidence. Conversely, the model allowed systematic evaluation of the results of these high-throughput experiments by comparing them to its predictions. Inconsistencies directly targeted informative experimental results for which further investigation are required. As shown in this work, not all inconsistencies led to model improvements. Some of them could be interpreted in terms of biological processes lying outside the scope of the modeling framework, probably regulation in most cases. In addition, a significant number of discrepancies reported in this work remained unexplained or led to hypotheses in need of confirmation through further study. The process described here was driven by expert curation: each inconsistency was manually examined in order to search for an interpretation and a possible model correction, a labor-intensive proposition. The systematic use of such experimental data for model refinements would be greatly facilitated by the development of computational methods assisting the curator with his task, however. A number of methods have been developed to search for variants of model which match better with additional experimental data, mainly by seeking additions or removals of reactions in the metabolic network [48,49]. These methods have already proven efficient at suggesting metabolic pathways that account for previously unexplained growth on specific environments [48]. While they can be adapted to handle growth phenotypes of knockout mutant strains, they do not involve the gene-reaction association component of the model, which is shown here to be the main area of model improvement. The association between genes and reactions can be complex as regulatory constraints may interfere with the actual gene function assignments. Computational strategies are therefore needed to help interpret the consequences of gene essentiality data on gene activities. Deriving the full benefits from a metabolic model entail both accessing its components and using its predictive capabilities. We realized the former by providing access to a detailed metabolic pathways database, the latter through a software tool that performs online predictions, both being coupled at the level of genes and reactions and accessible through a single, highly-interactive interface. This interface allows end-users to carry systems level predictions, and compare them with corresponding experimental observations, putting the consequences of modeling in the context of the detailed biological information that went into the model. This tool should therefore provide researchers interested in A. baylyi metabolism with a valuable resource for investigating its phenotypic and physiological properties. Methods Initial reconstruction process The initial reconstruction of the metabolic network was carried out using data provided by (i) the genome expert annotation [19], (ii) the BioCyc metabolic pathway database automatically generated from these annotations [21] and (iii) various literature resources on biochemistry, including textbooks, reviews and journal publications (see Additional file 2). The genome annotation was downloaded from the MaGe interface [50,51] and used as input of the Pathway Tools software [21] in order to generate a BioCyc automatic reconstruction of the metabolic network. The predicted pathways were classified into 7 metabolic categories (central metabolism, nucleotide metabolism, amino acids metabolism, lipid & cell wall metabolism, degradation pathways, cofactor biosynthesis, transport) and examined manually before being included in the model. In order to meet the requirements of the modeling framework the mass balance and reversibility of the reactions were checked. Reversibility of the reactions was determined from literature evidence when available or based on simple thermodynamic considerations [52]. Proton translocation efficiencies of reactions of the respiratory chain were assumed to be similar to those of E. coli [53]. Resulting P/O ratio can range between 0.5 to 2, depending on the types of cytochrome oxidase and NADH dehydrogenase that are used. Reactions using generic compounds (for example a nitrile or a polymer of undetermined length) were instantiated with defined representative metabolites. In this respect, polymeric pathways were expanded into chains of specific reactions. Large polymeric molecules such as the acyl carrier protein (ACP) or tRNAs were included in the model when they were involved as substrate cofactors of biochemical reactions. Their specific synthesis was not considered in the model. Dependency between reactions and genes were coded by Gene-Protein-Reaction (GPR) Boolean relationships (see below). Using the Cyclone interface to BioCyc [54], we implemented a simple method based on gene homologies between Escherichia coli and Acinetobacter baylyi to infer enzyme complexes and find AND Boolean associations between genes. Information from the literature was used to close gaps in the metabolic pathways, include pathways specific to A. baylyi that were unknown to the metabolic databases, and check the predicted pathways, for instance for the specificity of the cofactors. Physiological information derived from the literature [15,55-59] was used together with genome annotation tools, e.g. TransportDB [60], to add transport reactions in the model. A generic transport reaction was added to the model for each metabolite shown to be utilized by A. baylyi. A fixed biomass composition was chosen according to data found in the literature for strains growing on standard media (see Additional file 4). This biomass composition was used to build the reduced list of essential biomass precursors and derive a biomass reaction for Flux Balance Analyses (see below). To help properly account for all metabolic requirements associated with growth, we decomposed the biomass reaction into a set of intermediary biomass reactions synthesizing generic cell constituents (e.g. protein, DNA, RNA, or lipid) from precursor metabolites and a global growth reaction consuming them according to the chosen biomass composition. See Additional file 4 for details on these reactions. Modeling framework The metabolic model is composed of three components, namely GPR, NETWORK and BIOMASS. The GPR component models the dependency between genes and reactions using Boolean functions usually called gene-protein-reaction (GPR) associations [22]). For each reaction, a Boolean rule encodes how genes are related to the activity. Genes that are required together are linked with an AND relation while isofunctional genes are linked with an OR relation. The set of GPR associations yields the set of potentially active reactions given the set of available genes. The NETWORK component models the metabolic network using the constraint-based modeling framework [3]. This framework describes the distributions of reaction fluxes that are compatible with constraints that derive from basic physical assumptions or specific biological information. They are usually formulated as linear constraints, which allow to explore the fluxes solution space using linear programming tools. The main constraint is imposed by the steady-state assumption, represented by the matrix equation:
where S is the stoichiometric matrix of the metabolic network and ν the vector of reaction fluxes. The stoichiometric matrix is a matrix of size (m × n) where m is the number of metabolites and n the number of reactions. Each element Si,j of the matrix represents the relative stoichiometric coefficient of metabolite i in reaction j. Additional constraints on the fluxes, such as irreversibility and capacity constraints, are imposed by inequalities in the form:
where νlb,i and νub,i are respectively the lower and upper bounds of the flux of reaction i. Environmental conditions are applied to the model by constraining the exchange fluxes of extracellular metabolites. Exchange fluxes are sink reactions allowing to control the input or output of metabolites in the model. They are constrained to 0 ≤ νi ≤ ∞ for metabolites absent from the medium and -∞ ≤ νi ≤ ∞ for metabolites present in the medium, except for limiting nutrients for which a maximum uptake rate is chosen (-νuptake ≤ νi ≤ ∞). When simulating the metabolic network of a knockout mutant, the activity of each reaction is determined by evaluating its GPR association according to the set of removed genes. Fluxes of the inactivated reactions are constrained to be equal to zero. The BIOMASS component models the essential metabolic requirements for growth. It consists of a list of metabolites that are considered to be essential biomass precursors. Growth phenotype is therefore determined by checking their producibility [26]. To do so, the steady-state constraints for the essential biomass precursors are changed to strict producibility constraints: where Sinternal is the stoichiometric matrix without the biomass precursors, Sbiomass precursors the stoichiometric matrix restricted to the biomass precursors and ε a vector of small reals, taken as 10-3. Linear programming tools are used to query for a flux distribution fulfilling this set of constraints. If a flux distribution could be found, the model predicted growth, otherwise it predicted no growth. In order to assess quantitative growth defects, Flux Balance Analyses (FBA) were performed [3]. A biomass reaction was introduced in the model to quantitatively account for the respective contributions of constituent metabolites in the biomass composition (see Additional file 4). Using linear programming, the flux through this reaction was maximized under all constraints, representing the maximal growth rate achievable by the model. Energetic parameters, including growth associated (GAM) and non growth associated (NGAM) maintenance fluxes, were assumed to be similar to those of E. coli model [22]. We chose to set NGAM to a constant ATP hydrolysis flux of 10 mmol/h/gDW and GAM to a value of 40 mmol/gDW of ATP in the growth reaction. In all simulations, upper bounds of nutrient exchange fluxes were set to 10 mmol/h/gDW for carbon sources and 100 mmol/h/gDW for other nutrients (see Additional file 2). Availability of metabolic model Growth phenotyping of the wild-type strain Growth phenotyping experiments of A. baylyi were performed by Biolog, Inc. (Hayward, CA) following experimental procedures described in [66]. Basically, growth of wild-type strains of A. baylyi was monitored in PM1 and PM2 microplates containing a defined minimal medium supplemented with 190 distinct carbon sources. The Biolog quantitative growth measures were discretized to yield growth/no-growth qualitative phenotypes by choosing thresholds based on the negative growth control measures and previously known growth phenotypes for A. baylyi. Growth phenotypes that were inconsistent with model predictions were checked by examining results from previous work [15], or retesting them individually. Detailed results of Biolog experiments are provided in Additional file 3. Growth phenotyping of the mutant strains Detailed experimental protocol for the growth phenotyping of the mutant strains is described in [8]. Basically, using 96-wells plates, the mutant strains were grown in liquid MA minimal media (31 mM Na2HPO4, 25 mM KH2PO4, 18 mM NH4Cl, 41 μM nitrilotriacetic acid, 2 mM MgSO4, 0.45 mM CaCl2, 3 μM FeCl3, 1 μM MnCl2, 1 μM ZnCl2, 0.3 μM (CrCl3, H3BO3, CoCl2, CuCl2, NiCl2, Na2NoO4, Na2SeO3)) supplemented with 25 mM of carbon sources. Succinate/urea medium was composed of MA minimal medium without NH4Cl supplemented with 25 mM of succinate and 20 mM of urea. Absorbance at 600 nm of 24 h cultures was measured to monitor growth. Experiments were performed in duplicates. Measures with discrepant repeats or with weak precultures were discarded from the analyses. Repeats were filtered according to the following rule: a measure was kept if either (1) both repeats were under the growth threshold or (2) the relative difference between the repeats was lower than 50% of the highest value. A threshold of a tenth of the mean absorbance was chosen to classify the mutants in growth or no growth categories. This threshold was chosen particularly low in order to consider as essential only mutants with marked fitness defect. Authors' contributions MD reconstructed the initial model, performed model predictions, interpreted inconsistent phenotypes, applied model corrections, and wrote the manuscript. FLF reconstructed the initial model and developed the NemoStudio software tool. VDB participated in the experimental phenotyping and the interpretation of inconsistent phenotypes. AK and DV participated in the initial reconstruction and the interpretation of inconsistent phenotypes. CC and SS developed the NemoStudio software tool. MS participated in the experimental phenotyping and the interpretation of inconsistent phenotypes. JW participated in the design and the coordination of the study. VS conceived of the study, participated in its design and coordination, and contributed to writing the manuscript. All authors read and approved the final manuscript. Additional file 1 Sensitivity on GAM and NGAM parameters of growth rate predictions. This file contains two plots showing the effect of changing growth associated (GAM) and non growth associated (NGAM) maintenance parameters on quantitative growth rate predictions with iAbaylyiv4. Click here for file(59K, pdf) Additional file 2 Genome-scale metabolic models. This file contains the description of all model versions as well as information on reactions, species, biomass precursors, modeled environments and literature references used for the model reconstruction. Click here for file(1.6M, xls) Additional file 3 Experimental data and model refinements. This file gathers the experimental results used for model refinements, the model predictions, and the corrections/interpretations associated to the inconsistent predictions. Click here for file(857K, xls) Additional file 4 Determination of biomass composition of A. baylyi. This file gathers all information used to reconstruct the biomass assembly reactions in the metabolic model. Click here for file(109K, xls) Additional file 5 Genome-scale metabolic model in SBML format. This file contains the latest model iAbaylyiv4 in SBML format http://www.sbml.org. Click here for file(1.6M, xml) Acknowledgements We would like to thank Pierre-Yves Bourguignon for comments and insightful discussions on this work. We are also grateful to Georges Cohen, Cécile Fischer, Alain Perret, and Agnès Pinet for their help on the fine points of A. baylyi biochemistry. We wish to thank the reviewers for their help in improving the manuscript. We are grateful for the support of the European Networks of Excellence BIOSAPIENS (contract LSHG-CT-2003-503265) and ENFIN (contract LSHG-CT-2005-518254). References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||
Nat Rev Mol Cell Biol. 2006 Mar; 7(3):198-210.
[Nat Rev Mol Cell Biol. 2006]Nat Rev Genet. 2006 Feb; 7(2):130-41.
[Nat Rev Genet. 2006]Nat Rev Microbiol. 2004 Nov; 2(11):886-97.
[Nat Rev Microbiol. 2004]Mol Syst Biol. 2006; 2():2006.0008.
[Mol Syst Biol. 2006]Mol Syst Biol. 2008; 4():174.
[Mol Syst Biol. 2008]Curr Opin Biotechnol. 2006 Oct; 17(5):448-56.
[Curr Opin Biotechnol. 2006]Nature. 2004 Jun 10; 429(6992):661-4.
[Nature. 2004]Genome Res. 2004 Jul; 14(7):1298-309.
[Genome Res. 2004]Clin Microbiol Rev. 1996 Apr; 9(2):148-65.
[Clin Microbiol Rev. 1996]Nucleic Acids Res. 2004; 32(19):5780-90.
[Nucleic Acids Res. 2004]Annu Rev Microbiol. 2005; 59():519-51.
[Annu Rev Microbiol. 2005]Nucleic Acids Res. 2004; 32(19):5766-79.
[Nucleic Acids Res. 2004]Mol Syst Biol. 2008; 4():174.
[Mol Syst Biol. 2008]Bioinformatics. 2002; 18 Suppl 1():S225-32.
[Bioinformatics. 2002]Genome Biol. 2003; 4(9):R54.
[Genome Biol. 2003]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D334-7.
[Nucleic Acids Res. 2005]Appl Microbiol. 1974 Jul; 28(1):58-63.
[Appl Microbiol. 1974]Bioinformatics. 2005 May 1; 21(9):2008-16.
[Bioinformatics. 2005]Bioinformatics. 2005 May 1; 21(9):2008-16.
[Bioinformatics. 2005]Nucleic Acids Res. 2004; 32(19):5766-79.
[Nucleic Acids Res. 2004]Appl Environ Microbiol. 2006 Jan; 72(1):932-6.
[Appl Environ Microbiol. 2006]J Bacteriol. 2006 Aug; 188(15):5479-86.
[J Bacteriol. 2006]J Bacteriol. 2004 Mar; 186(6):1629-37.
[J Bacteriol. 2004]Appl Environ Microbiol. 2006 Jan; 72(1):932-6.
[Appl Environ Microbiol. 2006]Nucleic Acids Res. 2004; 32(19):5766-79.
[Nucleic Acids Res. 2004]J Bacteriol. 1961 May; 81():694-703.
[J Bacteriol. 1961]Nucleic Acids Res. 2004; 32(19):5766-79.
[Nucleic Acids Res. 2004]J Bacteriol. 1961 May; 81():694-703.
[J Bacteriol. 1961]J Bacteriol. 1998 Aug; 180(16):4278-86.
[J Bacteriol. 1998]Mol Syst Biol. 2008; 4():174.
[Mol Syst Biol. 2008]Mol Syst Biol. 2008; 4():174.
[Mol Syst Biol. 2008]Mol Syst Biol. 2007; 3():121.
[Mol Syst Biol. 2007]Nucleic Acids Res. 2001 Jan 1; 29(1):123-5.
[Nucleic Acids Res. 2001]Mol Syst Biol. 2008; 4():174.
[Mol Syst Biol. 2008]J Biol Chem. 1986 May 25; 261(15):6765-71.
[J Biol Chem. 1986]Nucleic Acids Res. 1987 Mar 11; 15(5):2137-55.
[Nucleic Acids Res. 1987]Gene. 1993 Dec 31; 137(2):179-85.
[Gene. 1993]Proc Natl Acad Sci U S A. 1999 Aug 3; 96(16):8985-90.
[Proc Natl Acad Sci U S A. 1999]J Biochem. 2005 Mar; 137(3):395-400.
[J Biochem. 2005]Chem Senses. 1997 Feb; 22(1):53-65.
[Chem Senses. 1997]Appl Environ Microbiol. 1986 Jun; 51(6):1304-8.
[Appl Environ Microbiol. 1986]J Bacteriol. 1973 Oct; 116(1):410-7.
[J Bacteriol. 1973]Nucleic Acids Res. 2004; 32(19):5766-79.
[Nucleic Acids Res. 2004]Appl Environ Microbiol. 1987 Feb; 53(2):440-6.
[Appl Environ Microbiol. 1987]Genome Biol. 2007; 8(5):R89.
[Genome Biol. 2007]Genome Biol. 2007; 8(5):R89.
[Genome Biol. 2007]J Theor Biol. 2003 Apr 7; 221(3):309-25.
[J Theor Biol. 2003]Bioinformatics. 2002; 18 Suppl 1():S225-32.
[Bioinformatics. 2002]Bioinformatics. 2002; 18 Suppl 1():S225-32.
[Bioinformatics. 2002]Genome Biol. 2007; 8(5):R89.
[Genome Biol. 2007]Biotechnol Bioeng. 2003 Dec 20; 84(6):647-57.
[Biotechnol Bioeng. 2003]Genome Res. 2004 Nov; 14(11):2367-76.
[Genome Res. 2004]Clin Microbiol Rev. 1996 Apr; 9(2):148-65.
[Clin Microbiol Rev. 1996]PLoS One. 2008 Mar 19; 3(3):e1805.
[PLoS One. 2008]Mol Syst Biol. 2006; 2():2006.0008.
[Mol Syst Biol. 2006]Proc Natl Acad Sci U S A. 2006 Nov 14; 103(46):17480-4.
[Proc Natl Acad Sci U S A. 2006]PLoS Comput Biol. 2006 Jul 7; 2(7):e72.
[PLoS Comput Biol. 2006]Nucleic Acids Res. 2004; 32(19):5766-79.
[Nucleic Acids Res. 2004]Bioinformatics. 2002; 18 Suppl 1():S225-32.
[Bioinformatics. 2002]Nucleic Acids Res. 2006; 34(1):53-65.
[Nucleic Acids Res. 2006]Bioinformatics. 2003 Jan 22; 19(2):270-7.
[Bioinformatics. 2003]Bioinformatics. 2007 May 15; 23(10):1299-300.
[Bioinformatics. 2007]Appl Environ Microbiol. 2006 Jan; 72(1):932-6.
[Appl Environ Microbiol. 2006]Appl Environ Microbiol. 1998 Apr; 64(4):1175-9.
[Appl Environ Microbiol. 1998]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D284-8.
[Nucleic Acids Res. 2004]Genome Biol. 2003; 4(9):R54.
[Genome Biol. 2003]Nat Rev Microbiol. 2004 Nov; 2(11):886-97.
[Nat Rev Microbiol. 2004]Bioinformatics. 2005 May 1; 21(9):2008-16.
[Bioinformatics. 2005]Nat Rev Microbiol. 2004 Nov; 2(11):886-97.
[Nat Rev Microbiol. 2004]Genome Biol. 2003; 4(9):R54.
[Genome Biol. 2003]Bioinformatics. 2003 Jan 22; 19(2):261-9.
[Bioinformatics. 2003]Genome Res. 2001 Jul; 11(7):1246-55.
[Genome Res. 2001]Appl Environ Microbiol. 2006 Jan; 72(1):932-6.
[Appl Environ Microbiol. 2006]Mol Syst Biol. 2008; 4():174.
[Mol Syst Biol. 2008]