• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Nat Methods. Author manuscript; available in PMC Jul 19, 2011.
Published in final edited form as:
PMCID: PMC3139173
NIHMSID: NIHMS305996

Metabolic network analysis integrated with transcript verification for sequenced genomes

Abstract

With sequencing of thousands of organisms completed or in progress, there is a growing need to integrate gene prediction with metabolic network analysis. Using Chlamydomonas reinhardtii as a model, we describe a systems-level methodology bridging metabolic network reconstruction with experimental verification of enzyme encoding open reading frames. Our quantitative and predictive metabolic model and its associated cloned open reading frames provide useful resources for metabolic engineering.

Present availability of genome sequences for diverse microorganisms brings opportunities for metabolic engineering through systems-level characterization of these organisms’ metabolic networks1. Such efforts require both functional and structural annotation of metabolic components encoded within these genomes. Although advances have been made in defining transcribed protein coding sequences for widely studied eukaryotes, notable deficiencies in genome annotation remain2. These problems are evident in the genomes of less widely studied species for which comparative genomic information is scarce. Structural annotations of boundaries for many genes in newly sequenced genomes are often poorly defined because of incomplete understanding of transcriptional-initiation, termination and splicing rules, and deficiencies in gene-prediction algorithms3. Genes with valid structural annotations lack thorough functional annotations linking transcripts to enzymatic or regulatory activities of corresponding proteins4.

Given the close relationship between gene annotation and metabolic network reconstruction1,5, we propose a targeted iterative methodology, integrating experimental transcript verification with genome-scale computational modeling (Fig. 1). An initial metabolic network, generated using literature sources and bioinformatics-generated functional annotation, served to identify C. reinhardtii genes in need of experimental definition and validation. We performed reverse-transcription PCR (RT-PCR) and rapid amplification of cDNA ends (RACE) to verify existence of hypothetical transcripts and to refine structural annotations. We used the results of transcript verification experiments to refine the metabolic model, with a focus on eliminating reactions associated with experimentally unverified transcripts. We filled resulting gaps in pathways by incorporating alternative sets of enzymes and by applying more detailed functional annotation to identify transcript models associated with necessary reactions. We also added and expanded pathways to yield a more complete metabolic model, providing the basis for another round of transcript verification and network modeling. Iterative refinement continued until the network and its associated genes were fully developed and validated.

Figure 1
Assessing and improving gene annotation for C. reinhardtii: iterative process integrating gene annotation experiments with metabolic network reconstruction and analysis. Starting with a draft network reconstruction, EC terms associated with model reactions ...

To begin our iterative process, functional annotation was needed for current C. reinhardtii genome sequence. Because Enzyme Commission (EC) annotation was only available for a previous version of the genome (Joint Genome Institute (JGI) v3.0), we generated our own annotations (Supplementary Note and Supplementary Figs. 1,2). Using the publicly available C. reinhardtii version 3.1 transcripts (JGI v3.1, ftp://ftp.jgi-psf.org/pub/JGI_data/Chlamy/v3.1/Chlre3_1.fasta.gz), we assigned EC numbers by basic local alignment search tool (BLAST) sequence comparison of in silico–translated v3.1 transcripts against UniProt-SwissProt6 and the complete Arabidopsis thaliana proteome dataset. Our new annotation (Supplementary Table 1) included EC terms missing from existing annotation, yielding functional differences in metabolic pathways (Fig. 2a,b). For example, six EC terms used for production of triacylglycerol, a glyceride of interest for biofuel purposes, were included in our new annotation but not in existing annotations (Supplementary Table 2).

Figure 2
Integrating the network model with transcript verification experiments. (a) Comparison of central metabolic EC terms annotated in existing JGI v3.0 and our annotation of JGI v3.1 (Supplementary Note). (b) Applying these two versions of EC annotation to ...

Having assigned EC annotation for the translated JGI v3.1 transcripts, we generated a central metabolic network reconstruction of C. reinhardtii, integrating literature-sourced data with our newly generated EC annotation of JGI v3.1. We used the Kyoto Encyclopedia of Genes and Genomes (KEGG), Expert Protein Analysis System (ExPASy) and literature sources to delineate pathway structure and reaction stoichiometry. The resulting metabolic network model specified the full stoichiometry of central metabolism in C. reinhardtii, accounting for all cofactors and metabolite connections1, with reactions localized to the cytosol, mitochondria, chloroplast (including the lumen as a subcompartment for photosynthesis) glyoxysome and flagellum. We obtained the localization evidence mainly from literature and supplemented it by subcellular localization predictions7. We established transport reactions using literature-sourced evidence where possible, supplementing it with information from online databases where appropriate. Of the 69 unique EC terms contained within the initial reconstruction and used to guide transcript verification experiments (Supplementary Table 3), all but four were annotated in the C. reinhardtii v3.1 proteome. The missing EC terms (1.1.1.28, 1.2.7.1, 1.3.99.1 and 6.2.1.5) could be assigned to homologous C. reinhardtii proteins but matched better to reference proteins bearing different EC numbers, and so could not be assigned unambiguously.

We confirmed EC assignments for 174 transcripts by assigning enzymatic domains to the protein products using hidden Markov model-based software HMMER8 (Supplementary Table 4) and experimentally verified these transcripts in two ways. First, we performed RT-PCR with primers corresponding to putative open reading frames (ORFs) encoding central metabolic enzymes (Supplementary Table 5). The successful cloning and a matched sequence9 of an ORF to its predicted model indicated the presence of the hypothesized transcript, whereas failure in this task was most often due to annotation errors of ORF termini2. Second, we carried out RACE on ORFs that either could not be cloned via RT-PCR or were confirmed only at one end, with the aim of correcting ORF termini annotation errors. Using RT-PCR, we confirmed 78% of the tested JGI v3.1 ORF models, and RACE allowed confirmation of 53% and refinement of 24% of the ORFs that we could not verify by RT-PCR. Altogether, we verified 90%, refined structural annotation of 5% and provided experimental evidence for 99% of the 174 examined ORFs encoding central metabolic enzymes (Fig. 2c and Supplementary Table 4). Our experimental verification of ORF models guided refinement of the metabolic model in the next cycle of our iterative methodology, and generated ORF clones can be used for downstream studies.

We expanded the metabolic network reconstruction to include more complete coverage of all pathways included in the initial model. For example, the glyoxylate metabolism pathway in our initial network reconstruction included only four enzymes needed for acetate uptake, but our final reconstruction included 16 enzymes, reflecting more complete curation of this pathway. After additionally updating the metabolic network reconstruction with transcript verification results, we validated the model by comparing in silico predictions to quantitative literature-based physiological parameters under a variety of environmental conditions and qualitative literature-based characterization of known mutants (Supplementary Note, Supplementary Tables 6,7 and Supplementary Fig. 3). Agreement between in silico predictions and existing experimental data brought confidence to predictions of metabolic engineering targets (Supplementary Fig. 4).

The resulting network reconstruction, named iAM303 per established convention10, accounted for 259 reactions corresponding to 106 distinct EC terms (Supplementary Fig. 5, Supplementary Tables 8,9 and Supplementary Data 1). Of the experimentally tested JGI v3.1 transcripts corresponding to 65 unique EC terms from the initial metabolic model, only phosphofructokinase and the Rieske iron-sulfur protein of ubiquinol-cytochrome c oxidoreductase complex were not verified in our RT-PCR or RACE experiments: we left unverified one of the four transcripts corresponding to phosphofructokinase and one of the three transcripts corresponding to ubiquinol-cytochome c oxidoreductase complex (the Rieske iron-sulfur protein) (Supplementary Table 4). As we grew our cultures under constant light, these results suggest that we identified light/dark–regulated forms of transcripts corresponding to these enzymes, evidence for which has been documented for phosphofructokinase in the cyanobacteria Synechocystis sp.11. Although any parallel drawn from cyanobacteria is tentative, that the unverified phosphofructokinase transcript was the only one mapped by subcellular localization prediction7 to the chloroplast further indicates light/dark regulation may occur in the eukaryotic C. reinhardtii. These findings indicate our integrative approach is flexible toward functional annotation of differentially regulated transcripts and transcript variants.

With ORF verification results for all annotated enzymes in the current version of our metabolic network reconstruction, we demonstrated a complete cycle of our iterative approach. Although not all enzymes in the model could be completely validated experimentally, we seek to recover these enzymes in the next round of experiments. For enzymes present in the network reconstruction but lacking functionally assigned transcripts in the C. reinhardtii genome, we performed more detailed searches using position-specific iterative BLAST (PSI-BLAST) to assign likely targets to corresponding EC numbers (Table 1); newly assigned transcript models can be followed up in the next iteration of experiments. EC terms annotated in JGI v3.1 which were not fully verified by our RACE and RT-PCR transcript verification experiments, but are supported by both literature and modeling evidence, suggest corresponding transcripts are present in C. reinhardtii, particularly under dark conditions. In the next round of experiments, we will attempt to verify these transcripts in the absence of light. Our structural reannotation of transcripts will also inform reannotation of functional enzymatic domains needed to refine and expand our metabolic network model.

Table 1
EC terms guiding reconciliation of literature, modeling and experimental evidence

Although throughput of our method is modest compared to fully automated computational approaches, we achieved higher quality structural and functional annotation for a targeted set of metabolic enzymes. Accordingly, our integrative approach produced: (i) a well-validated metabolic network reconstruction of C. reinhardtii, (ii) functional annotation needed to map the network reconstruction to associated transcripts and (iii) experimentally based structural annotation, providing the requisite toolset for metabolic engineering toward improved biofuel production (Supplementary Fig. 4). Whereas the latter does not provide direct proof of function, it establishes the necessary condition upon which functional assignments can be proposed, and targeted experiments may be performed to verify function.

With only 1% of experimentally tested transcripts left unverified, our effort provides proof of concept for the proposed approach integrating network analysis with experimental transcript verification. Because this success may be attributed in part to our focus on central metabolism, enzymes and pathways of which are generally the best characterized, our manual curation efforts will be even more important in informing high-quality transcript annotation refinement as we extend our metabolic model to the genome-wide scale. Although our work has focused on C. reinhardtii, integration of gene annotation experiments with network reconstruction can be applied broadly toward improved annotation of existing and emerging genome sequences. Our pipeline for functional annotation based on existing annotation of A. thaliana provides a computationally efficient approach to extract functional annotation for species with one or more well-annotated close relatives. For new genome sequences without availability of closely related reference sequence, more sophisticated approaches, including PSI-BLAST and hidden Markov model–based programs, may provide viable alternatives. Although existing transcriptomic technologies lag behind RT-PCR and RACE in their ability to provide well-defined ORF structure and precise definition of exon-boundaries for eukaryotic sequence data, emerging sequencing technologies12 open possibilities to scale up the throughput of our methodology. Finally, we may look beyond metabolic network modeling toward reconstruction of regulatory13 and signaling14 networks as alternative systems-level frameworks to guide future efforts.

METHODS

Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturemethods/.

Supplementary Material

Supp1

Supp2

Supp3

Supp4

Supp5

Supp6

Supp7

Acknowledgments

This research was supported by the Office of Science (Biological and Environmental Research), US Department of Energy, grant DE-FG02-07ER64496 (to J.A.P. and K.S.-A.), the Jane Coffin Childs Memorial Fund for Medical Research (to E.F.Y.H.) and by National Science Foundation IGERT training grant DGE0504645 (to R.L.C.).

Footnotes

AUTHOR CONTRIBUTIONS

A.M., A.K.C., R.L.C. and I.T. reconstructed metabolic networks; L.G., R.R.M., X.Y. and E.M. performed transcript verification experiments, E.F.Y.H. performed localization prediction; L.G., C.L., Y.S., C.F. and T.H., annotated transcripts and analyzed sequences; S.B. annotated transcripts; D.E.H. and M.V. initially developed the transcript verification pipeline; A.M., L.G., E.F.Y.H., K.S.A., J.P., development of pipeline to integrate model with experiments; A.M., L.G., E.F.Y.H., C.L., R.L.C., R.R.M., K.S.-A. and J.A.P. wrote and edited the manuscript; D.E.H. and M.V. edited the manuscript; K.S.-A. guided transcript verification experiments and transcript annotation; J.A.P. guided the metabolic network reconstruction; J.A.P. and K.S.-A. conceived the study.

Note: Supplementary information is available on the Nature Methods website.

Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.

References

1. Feist AM, Herrgård MJ, Thiele I, Reed JL, Palsson B. Nat Rev Microbiol. 2009;7:129–143. [PMC free article] [PubMed]
2. Reboul J, et al. Nat Genet. 2001;27:332–336. [PubMed]
3. Jones SJM. Annu Rev Genomics Hum Genet. 2006;7:315–338. [PubMed]
4. Frishman D. Chem Rev. 2007;107:3448–3466. [PubMed]
5. Boyle NR, Morgan JA. BMC Syst Biol. 2009;3:4. [PMC free article] [PubMed]
6. Apweiler R, et al. Nucleic Acids Res. 2004;32:D115–D119. [PMC free article] [PubMed]
7. Lu Z, et al. Bioinformatics. 2004;20:547–556. [PubMed]
8. Zhang Z, Wood WI. Bioinformatics. 2003;19:307–308. [PubMed]
9. Walhout AJ, et al. Methods Enzymol. 2000;328:575–592. [PubMed]
10. Reed JL, Vo TD, Schilling CH, Palsson BO. Genome Biol. 2003;4:R54. [PMC free article] [PubMed]
11. Kucho K, et al. J Bacteriol. 2005;187:2190–2199. [PMC free article] [PubMed]
12. Shendure J, Ji H. Nat Biotechnol. 2008;26:1135–1145. [PubMed]
13. Herrgård MJ, Covert MW, Palsson B. Curr Opin Biotechnol. 2004;15:70–77. [PubMed]
14. Papin JA, Hunter T, Palsson BO, Subramaniam S. Nat Rev Mol Cell Biol. 2005;6:99–111. [PubMed]
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • EST
    EST
    Published EST sequences
  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles
  • Taxonomy
    Taxonomy
    Related taxonomy entry
  • Taxonomy Tree
    Taxonomy Tree

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...