177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.![]() | ![]() |
Formats:
|
||||||||||||||
Copyright © The Author 2005. Published by Oxford University Press. All rights reserved The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes 1Fellowship for Interpretation of Genomes, 15W155 81st Street, Burr Ridge, IL 60527, USA 2Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA 3Center for Biotechnology, Institute for Genome Research, Bielefeld University, 33594 Bielefeld, Germany, USA 4International NRW Graduate School in Bioinformatics & Genome Research, Institute for Genome Research, Bielefeld University, 33594 Bielefeld, Germany, USA 5Emerson Hall, University of Florida, PO Box 14425, Gainesville, FL 32604, USA 6Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia 7Center for Microbial Sciences, San Diego State University, San Diego, CA 92813, USA 8The Burnham Institute, San Diego CA 92037, USA 9Department of Microbiology, University of Illinois at Urbana-Champaign, Urbana, IL 61801 10Computer Science Dept, Middle Tennessee State University, Murfreesboro, TN 37132, USA 11Danish Genome Institute, Gustav Wieds vej 10 C, DK-8000 Aarhus C, Denmark 12Computation Institute, University of Chicago, Chicago, IL 60637, USA 13Departments of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA 14Department of Horticultural Science, University of Florida, Gainesville, FL 32611, USA 15Department of Chemistry, Portland State University, Portland, OR 97207, USA 16Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY14853, USA 17University of California, San Diego, CA 92093, USA 18Cleveland BioLabs, Inc., Cleveland, OH 44106, USA *To whom correspondence should be addressed. Tel: +1 630 325 4178; Fax: +1 630 325 4179; Email: Veronika/at/theFIG.info Received June 9, 2005; Revised September 8, 2005; Accepted September 8, 2005. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions/at/oxfordjournals.org This article has been cited by other articles in PMC.Abstract The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.INTRODUCTION In the 10 years since the first complete bacterial genome was released in 1995 (1) there has been an exponential growth in the number of complete genomes sequenced. More than 200 complete genomes have been released, and based on past growth we anticipate that the 1000th genome will be sequenced at some point during 2007 (Figure 1
In response to these challenges the Fellowship for Interpretation of Genomes (FIG) launched the ‘Project to Annotate a 1000 Genomes’. The Project embodies a specific strategic view of how to approach high-throughput annotation: the effort is organized around subsytem experts, individuals who master the details of a specific subsystem and then analyze and annotate the genes that make up that given subsystem over an entire collection of genomes. We argue that a subsystems based approach provides many benefits compared to more traditional techniques of genome annotation:
WHAT IS A SUBSYSTEM A subsystem is a set of functional roles that together implement a specific biological process or structural complex (Table 1). A subsystem may be thought of as generalization of the term pathway. Thus, just as glycolysis is composed of a set of functional roles (glucokinase, glucose-6-phosphate isomerase and phosphofuctokinase, etc.) a complex like the ribosome or a transport system can be viewed as a collection of functional roles. In practice, we put no restriction on how curators select the set of functional roles they wish to group into a subsystem, and we find subsystems being created to represent the set of functional roles that make up pathogenicity islands, prophages, transport cassettes and complexes (although many of the existing subsystems do correspond to metabolic pathways). The concept of populated subsystem is an extension of the basic notion of subsystem—it amounts to a subsystem along with a spreadsheet depicting the exact genes that implement the functional roles of the subsystem in specific genomes. The populated subsystem specifies which organisms include operational variants of the subsystem and which genes in those organisms implement the functional roles that make up the subsystem. Each column in the spreadsheet corresponds to a functional role from the subsystem, each row represents a genome, and each cell identifies the genes within the genome that encode proteins which implement the specific functional role within the designated genome (Figure 2
The act of populating the subsystem amounts to adding rows (i.e. genomes) to the spreadsheet. Since these concepts are fundamental to our discussion we are illustrating them in Figure 2 Note that each row in the spreadsheet has an associated variant code. The set of roles that make up the example subsystem include all of the functional roles needed to encode three common variants of the pathway. The variant codes distinguish three alternative means of converting N-formimino-l-glutamate to l-glutamate. We have adhered to the position that experts encoding subsystems must decide exactly which functional roles to include (and exactly how to express each functional role), as well as what variant codes to use. We have restricted the use of two variant codes: 0 to represent work in progress and −1 to represent no operational variant. A FRAMEWORK FOR DEVELOPING A PRECISE VOCABULARY FOR FUNCTIONAL ROLES Controlled vocabularies have often been proposed in computer-assisted annotations and data mining (4,5). Subsystems technology supports the definition of a controlled vocabulary for gene function. Domain experts, by defining the functional roles that make up the subsystems that they curate, impose a precise vocabulary for assignment of function to the genes that implement the subsystem. Since the term ‘gene function’ has come to have several meanings, it is important to distinguish between four concepts:
To this mix of concepts we add the notion subsytem connection. A gene can be connected to one or more functional roles, which induces connections to specific subsystems (those that contain the specific functional roles). In the example above it would be the connection to the subsystem ‘Lysine_Biosynthesis_DAP_Pathway’. Although product names often include special properties (e.g. ‘thermostable’ or ‘lysine-sensitive’), and occasionally clues of function (e.g. ‘similar to death associated protein kinase’), subsystem connections unambiguously reference specific functional roles included in the definition of a subsystem. Initially, the number of populated subsystems grew rapidly including numerous metabolic pathways, as well as non-metabolic subsystems ranging from flagella (http://www.theseed.org/annocopy/FIG/subsys.cgi?ssa_name=Flagellum&request=show_ssa, pathogenicity islands, http://www.theseed.org/annocopy/FIG/subsys.cgi?ssa_name=Mannose-sensitive_hemagglutinin_type_4_pilus&request=show_ssa), and secretory systems [http://www.theseed.org/annocopy/FIG/subsys.cgi?ssa_name=General_secretory_pathway_(Sec-SRP)_complex_(TC_3.A.5.1.1)&request=show_ssa] through complexes like the ribosome and proteosome. As both subsystems and the consequent subsystem connections matured there was considerable overlap between subsystems. Users developing subsystems on their own machines and sharing them through the clearinghouse exacerbated the differences in style, and hence conflicts between subsystems. For example, functional roles corresponding to the notion of aconitase exist in at least three distinct subsystems: the TCA cycle (http://www.theseed.org/annocopy/FIG/subsys.cgi?ssa_name=TCA_Cycle&request=show_ssa), the methylcitate cycle (http://www.theseed.org/annocopy/FIG/subsys.cgi?ssa_name=Methylcitrate_cycle&request=show_ssa), and glyoxylate synthesis (http://www.theseed.org/annocopy/FIG/subsys.cgi?ssa_name=Glyoxylate_Synthesis&request=show_ssa) developed independently by different curators. In at least one instance a curator wished to carefully distinguish three distinct forms of the enzyme. Initially each curator annotated the same protein-encoding genes with different functional roles, however this quickly became untenable—i.e. conflicts arose. To support uniform terminology required that the conflicts be detected, and be resolved by renaming functional roles to a consistent vocabulary employed consistently by all three subsystems. Rather than impose a centralized mechanism for resolving such conflicts, a completely decentralized approach was used. To facilitate coordination and communication between end users, to aid with conflict resolution, and to eliminate redundancy, a multi-author website was developed using Wiki technology (http://www-unix.mcs.anl.gov/SEEDWiki/moin.cgi/MoinMoin). The subsystem bulletin board (http://www.theseed.org/wiki/moin.cgi/SubsystemBulletinBoard) provides an overview of the subsystems and highlights individual researcher's efforts. For a more detailed discussion of each of the subsystems, a Forum was developed using vBulletin technology (http://www.vbulletin.com/). The Forum (http://www.subsys.info) has subsystems separated by class, and each subsystem has a discussion arena for the deposition of comments, questions, suggestions and ideas. In addition to these resources, interactive conflict detection and resolution software was developed for the installation of subsystems in the SEED database. Ultimately the success of our approach has been based on the good will and common desire to produce a consistent, precise vocabulary for functional roles, and we feel that this has worked well. It has produced a situation in which, at any given time, conflicts may exist because new subsystems are being developed or existing ones extended. But the attention of curators is being alerted to those instances by the development of tools that point to the conflicts. No centralized authority is being employed (although, in fact, on occasion curators do settle disagreements by consulting with outside experts). Conflicts can be of various types ranging from simple differences in spelling of functional roles to disagreements relating to specificity and numerous other issues. In all cases curators have reached settlements through discussions that lead to either consensus names or extended names. Once agreement has been reached and consistency established, changing the precise string of text that describes a functional role at some later point in time is trivial. The result has been a vocabulary for functional roles that is precise, reasonably consistent, and rapidly improving. Our strategy for coupling this vocabulary with widely practiced ontologies such as GO will be to attach GO terms to each of the functional roles (inducing connections to genes via subsystem connections). SUBSYSTEMS: A TECHNOLOGY INDEPENDENT OF ANNOTATION SYSTEMS The subsystems technology described herein was developed with two primary goals in mind. The first goal was to define a simple, portable text representation of a populated subsystem. This allowed populated subsystems to be exchanged, archived and updated over the Internet. And the second goal to develop a clearinghouse where curators can publish populated subsystems for exchange with other users. The clearinghouse is available for direct querying from within a program (http://clearinghouse.theseed.org/) or via a web-browser (http://clearinghouse.theseed.org/clearinghouse_browser.cgi). The development of this technology ensured that the subsystems information could be shared in a platform-independent manner, without requiring any centralized resource (such as a pathway collection). Any annotation environment can be developed or modified to support the creation and curation of subsystems using the clearinghouse (or, a local clearinghouse, if desired) as a repository. THE SEED TECHNOLOGY TO SUPPORT SUBSYSTEMS The SEED annotation environment is the first annotation environment that supports the creation, curation, population and exchange of subsystems. It supports publishing subsystems to a clearinghouse, and the downloading and installation of subsystems developed at other sites. The SEED was developed by an international collaboration led by members of FIG and Argonne National Laboratory (6). The software is being made available as open source software released under the GNU public license (GPL) from the ftp site ftp://ftp.theseed.org/SEED. Only a few enhancements would have to be added to any existing annotation system to support analysis of subsystems, and this functionality would extend existing software. The software would have to be extended to encode populated subsystems as objects and decode the populated subsystems as they are retrieved from the clearinghouse. Software would need to be included to publish and request populated subsystems from the clearinghouse. The software would have to be able to define the functional roles in initial subsystems, and to establish the subsystem connections between protein-encoding genes, functional roles and subsystems. EXAMPLE POPULATED SUBSYSTEMS Our populated subsystems were assembled into a single collection with a consistent formulation of functional roles and released via the web (http://www.theseed.org/Release1_Subsystems/index.html). An open source collection of software tools has been released via FTP ftp://ftp.theseed.org/SEED. To illustrate the advantages of subsystem based annotations over ‘traditional’ annotation systems several subsystems are described below: Leucine Degradation and HMG-CoA Metabolism (http://www.theseed.org/annocopy/FIG/subsys.cgi?ssa_name=Leucine_Degradation_and_HMG-CoA_Metabolism&request=show_ssa) The populated subsystem presenting the leucine catabolism/HMG-CoA synthesis is depicted in Figure 3
In humans leucine catabolism is coupled to sterol biosynthesis via a hydroxymethylglutaryl-coenzyme A (HMG-CoA) intermediate. The pathway is well characterized because defects in individual steps cause hereditary metabolic disorders like isovaleric acidemia, methylcrotonylglycinuria, methylglutaconic aciduria and 3-hydroxy-3-methylglutaric aciduria (8,9,10). Moreover, the human enzyme HMG-CoA reductase is a target in cardiovascular disease therapy because of its rate-limiting role in sterol biosynthesis (11). In contrast, only the early catabolic steps had been characterized in bacterial genomes—no genes were directly connected to enzymatic steps beyond isovaleryl-CoA (metabolite II in Figure 3B A combination of functional and genome context analysis, as depicted in the populated subsystem spreadsheet (Figure 3C Another functional inference from the analysis of this subsystem was a connection between leucine catabolism and acetoacetate metabolism (as illustrated in Figure 3B Panels B and C in Figure 3 This example illustrates how prokaryotic chromosomal clustering can influence the interpretation of pathways, prediction of missing genes and projection of annotations between prokaryotic and eukaryotic genes. The observations also contributed to interpretation of the evolutionary history of a large and diversified group of proteins. More such examples have been published elsewhere (3,14). Coenzyme A biosynthesis subsystem (http://www.theseed.org/annocopy/FIG/subsys.cgi?ssa_name=Coenzyme_A_Biosynthesis&request=show_ssa) Coenzyme A (CoA) is a universal and essential cofactor in all forms of cellular life (15). Earlier bioinformatics analysis of CoA biosynthesis revealed a number of interesting variations between species (3,16,17). In the respective SEED subsystem (see Figure 4
A possible fourth non-orthologous form of PANK can be inferred from the analysis of Archaea. The candidate for the missing archaeal PANK is a member of the GHMP kinase family which clusteres on the chromosome with several other CoA biosynthetic genes in some Archaea (i.e. PAE3407 of Pyrobaculum aerophilum). Another conserved family (represented by PAE1629 of P.aerophilum) may fulfill the role of dephospho-CoA kinase (DPCK), which is still ‘missing’ in all Archaea. This conjecture is based on a long-range sequence similarity with bacterial and eukaryotic enzymes (as suggested by the tentative annotation of COG0237 at NCBI http://www.ncbi.nlm.nih.gov/COG/old/palox.cgi?COG0237). Both functional predictions [also suggested by (17)] require experimental verification. Among other problems within this subsystem is a missing aspartate decarboxylase in a number of genomes with an otherwise complete set of genes for the de novo synthesis. Several examples illustrating major functional variants of the subsystem are outlined in Figure 4 Ribosomal proteins (http://www.theseed.org/SubsystemStories/Ribosomal_proteins/abstract.htm) Historically, ribosomal proteins were identified in several important experimental organisms, including E.coli, Bacillus species, yeast, rat and Halobacterium. In each case, a unique nomenclature was developed. More recently, several groups sought unified nomenclatures given the availability of so many sequences. In the cases of Bacteria and Eukarya, these efforts were hugely successful. The most problematic aspects of the conventions were (i) the failure to uniformly indicate whether a given label is based upon the bacterial or the eukaryal numbering, and (ii) the linking of equivalent eukaryal and bacterial terms. There are only two proteins (S3 and L3) for which the bacterial and eukaryal numbers are the same. This created a particularly confusing situation when the bacterial nomenclature was applied to Archaea, except when no bacterial homolog existed, in which case the eukaryal label was applied. To address these problems a dual labeling was applied in which bacterial proteins were given the bacterial label (always explicitly including the ‘p’, e.g. S5p), followed by the designation of the corresponding eukaryal protein in parentheses (always with the explicit ‘e’, e.g. S2e). Similarly, in the case of eukarya, the eukaryal protein designation is given first, followed by the bacterial label in parentheses. In the case of Archaea, in all but a few cases the proteins are clearly of the eukaryal genre, and the eukaryal term is given first. One of the most important consequences of this nomenclature is that a text-based search is always unambiguous as to whether the bacterial or eukaryal numbering is desired. For example, a search for L11p will return bacterial L11 and eukaryal L12, but not bacterial L5 (the equivalent of eukaryal L11). A second key decision was to use the terms LSU and SSU to distinguish the subunits, rather than 30S, 40S, 50S and 60S. In addition to further unifying the nomenclature, it avoids two key sources of confusion. Several eukaryal ribosomes (especially organellar ribosomes) have been assigned to ‘non-standard’ sizes. Thus, searching for 50S and/or 60S was not sufficient to ensure that all ribosomes were distinguished. But more importantly, it avoids the temptation to use 50S to designate the LSU of a eukaryal mitochondrial ribosome. Instead, we have explicitly identified all organellar proteins by ‘mitochondrial’ or ‘chloroplast’. The development of this nomenclature demonstrated the power of the subsystems approach for encoding non-metabolic pathways, and the utility of functional roles in describing a controlled vocabulary for gene product function. THE IMPACT OF POPULATED SUBSYSTEMS As demonstrated by the examples above, populated subsystems can be used to support two broad categories of research: advancing research in the populated subsystems themselves and addressing numerous fundamental problems within bioinformatics. It is important to note that there are large and ongoing efforts that address similar objectives—most notably the KEGG (http://www.genome.jp/kegg/kegg2.html) (22,23), GO (http://www.geneontology.org/) (5) and MetaCyc (http://metacyc.org/) (24) projects. These represent substantial projects, and we have in many ways built upon their work. Perhaps, the most obvious difference between our work and these projects is that we have made it possible for all researchers to immediately develop detailed encodings of their particular area of expertise, to make these new encodings available to the research community, and to import the work of others in constructing a customized collection of subsystems covering their specific needs. This radically decentralized effort offers a different set of incentives for domain experts to participate, which is precisely what will be needed to improve existing annotations. The primary utility of annotated subsystems relates to the fact that a populated subsystem often supports substantially more accurate assignments of function to genes. In addition the analysis of the populated subsystem allows one to arrive at a precise notion of which forms (i.e. which variants) of the subsystem exist in which organisms. Further, the spreadsheet included in an populated subsystem often makes it vividly clear that a gene implementing a specific functional role is very likely to exist, even though it has not yet been identified. These so-called missing gene problems occur with surprising frequency. In the two metabolic examples presented in this paper and in various instances published in the Supplemental Material we show in detail a few instances in which conjectures could easily be formulated once the actual presence of a missing gene had been identified. Finally, the presence of an extensive set of annotated subsystems lays the foundation for an accurate characterization of the metabolic network present in each organism. The existence of a collection of populated subsystems also has an impact on a number of important topics in bioinformatics:
THE RELEASE Concurrent with the publication of this paper, an initial snapshot release of our collection of populated subsystems (which was a subset of those available via the SEED clearinghouse) was made. This subset is available in a format that makes the data easily accessible for use in other systems or as raw data. The current release of 173 populated subsystems is available without restriction via the web. The supplementary online subsystems material includes three main components:
CONCLUSIONS Within 2–3 years we will all have access to over a thousand sequenced genomes. This data will grow to become the central resource in modern biology. Annotating this collection is the core challenge of modern bioinformatics. In this paper we describe a new approach to annotation based on idea of subsystems that promises to dramatically improve the quality and utility of annotations. This approach is central to the Project to Annotate 1000 genomes and has been implemented in a suite of tools for genome annotation. The approach and technology provide one way to involve many domain experts in the genome annotation process. The technology for developing these subsystems now exists, the technologies for supporting automated addition of new genomes to the collection of populated subsystems is now being developed, and the initial collection is being made available to the research community. Acknowledgments Funding to pay the Open Access publication charges for this article was provided by the Fellowship for Interpretation of Genomes. Conflict of interest statement. None declared. REFERENCES 1. Fleischmann R.D., Adams M.D., White O., Clayton R.A., Kirkness E.F., Kerlavage A.R., Bult C.J., Tomb J.F., Dougherty B.A., Merrick J.M., et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995;269:496–512. [PubMed] 2. Haft D.H., Selengut J.D., Brinkac L.M., Zafar N., White O. Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics. Bioinformatics. 2005;21:293–306. [PubMed] 3. Osterman A., Overbeek R. Missing genes in metabolic pathways: a comparative genomics approach. Curr. Opin. Chem. Biol. 2003;7:238–251. [PubMed] 4. Overbeek R., Larsen N., Smith W., Maltsev N., Selkov E. Representation of function: the next step. Gene. 1997;191:GC1–GC9. [PubMed] 5. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature Genet. 2000;25:25–29. [PubMed] 6. Overbeek R., Disz T., Stevens R. The SEED: a peer-to-peer environment for genome annotation. Commun. ACM. 2004;47:46–51. 7. Overbeek R., Devine D., Vonstein V. Curation is forever: comparative genomics approaches to functional annotation. Targets. 2003;2:138–146. 8. Tanaka K., Ikeda Y., Matsubara Y., Hyman D.B. Molecular basis of isovaleric acidemia and medium-chain acyl-CoA dehydrogenase deficiency. Enzyme. 1987;38:91–107. [PubMed] 9. Weyler W., Sweetman L., Maggio D.C., Nyhan W.L. Deficiency of propionyl-Co A carboxylase and methylcrotonyl-Co A carboxylase in a patient with methylcrotonylglycinuria. Clin. Chim. Acta. 1977;76:321–328. [PubMed] 10. Gibson K.M., Lee C.F., Hoffmann G.F. Screening for defects of branched-chain amino acid metabolism. Eur. J. Pediatr. 1994;153:S62–67. [PubMed] 11. Marz W., Wieland H. HMG-CoA reductase inhibition: anti-inflammatory effects beyond lipid lowering? Herz. 2000;25:117–125. [PubMed] 12. Loupatty F.J., Ruiter J.P., L I.J, 1st, Duran M., Wanders R.J. Direct nonisotopic assay of 3-methylglutaconyl-CoA hydratase in cultured human skin fibroblasts to specifically identify patients with 3-methylglutaconic aciduria type I. Clin. Chem. 2004;50:1447–1450. [PubMed] 13. Ly T.B., Peters V., Gibson K.M., Liesert M., Buckel W., Wilcken B., Carpenter K., Ensenauer R., Hoffmann G.F., Mack M., et al. Mutations in the AUH gene cause 3-methylglutaconic aciduria type I. Hum. Mutat. 2003;21:401–407. [PubMed] 14. Jordan I.K., Henze K., Fedorova N.D., Koonin E.V., Galperin M.Y. Phylogenomic analysis of the Giardia intestinalis transcarboxylase reveals multiple instances of domain fusion and fission in the evolution of biotin-dependent enzymes. J Mol. Microbiol. Biotechnol. 2003;5:172–189. [PubMed] 15. Begley T.P., Kinsland C., Strauss E. The biosynthesis of coenzyme A in bacteria. Vitam. Horm. 2001;61:157–171. [PubMed] 16. Gerdes S.Y., Scholle M.D., D'Souza M., Bernal A., Baev M.V., Farrell M., Kurnasov O.V., Daugherty M.D., Mseeh F., Polanuyer B.M., et al. From genetic footprinting to antimicrobial drug targets: examples in cofactor biosynthetic pathways. J. Bacteriol. 2002;184:4555–4572. [PubMed] 17. Genschel U. Coenzyme A biosynthesis: reconstruction of the pathway in archaea and an evolutionary scenario based on comparative genomics. Mol. Biol. Evol. 2004;21:1242–1251. [PubMed] 18. Brand L.A., Strauss E. Characterization of a new pantothenate kinase isoform from Helicobacter pylori. J. Biol. Chem. 2005;280:20185–20188. [PubMed] 19. Daugherty M., Polanuyer B., Farrell M., Scholle M., Lykidis A., de Crecy-Lagard V., Osterman A. Complete reconstitution of the human coenzyme A biosynthetic pathway via comparative genomics. J. Biol. Chem. 2002;277:21431–21439. [PubMed] 20. Choudhry A.E., Mandichak T.L., Broskey J.P., Egolf R.W., Kinsland C., Begley T.P., Seefeld M.A., Ku T.W., Brown J.R., Zalacain M., et al. Inhibitors of pantothenate kinase: novel antibiotics for staphylococcal infections. Antimicrob. Agents Chemother. 2003;47:2051–2055. [PubMed] 21. Ye Y., Osterman A., Overbeek R., Godzik A. Automatic detection of subsystem/pathway variants in genome analysis. Bioinformatics. 2005;21:478–486. 22. Kanehisa M. A database for post-genome analysis. Trends Genet. 1997;13:375–376. [PubMed] 23. Kanehisa M., Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. [PubMed] 24. Krieger C.J., Zhang P., Mueller L.A., Wang A., Paley S., Arnaud M., Pick J., Rhee S.Y., Karp P.D. MetaCyc: a multiorganism database of metabolic pathways and enzymes. Nucleic Acids Res. 2004;32:D438–442. [PubMed] 25. Gelfand M.S., Novichkov P.S., Novichkova E.S., Mironov A.A. Comparative analysis of regulatory patterns in bacterial genomes. Brief Bioinform. 2000;1:357–371. [PubMed] 26. Rodionov D.A., Vitreschak A.G., Mironov A.A., Gelfand M.S. Comparative genomics of thiamin biosynthesis in procaryotes: new genes and regulatory mechanisms. J. Biol. Chem. 2002;277:48949–48959. [PubMed] 27. Rodionov D.A., Mironov A.A., Gelfand M.S. Conservation of the biotin regulon and the BirA regulatory signal in eubacteria and archaea. Genome. Res. 2002;12:1507–1516. [PubMed] 28. Koonin E.V., Galperin M.Y. Sequence-Evolution-Function: Computational Approaches in Comparative Genomics. Boston: 1st Edn Kluwer Academic Publishers; 2002. 29. Huynen M.A., Snel B., von Mering C., Bork P. Function prediction and protein networks. Curr. Opin. Cell. Biol. 2003;15:191–198. [PubMed] 30. Xie G., Keyhani N.O., Bonner C.A., Jensen R.A. Ancient origin of the tryptophan operon and the dynamics of evolutionary change. Microbiol. Mol. Biol. Rev. 2003;67:303–342. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Science. 1995 Jul 28; 269(5223):496-512.
[Science. 1995]Bioinformatics. 2005 Feb 1; 21(3):293-306.
[Bioinformatics. 2005]Curr Opin Chem Biol. 2003 Apr; 7(2):238-51.
[Curr Opin Chem Biol. 2003]Bioinformatics. 2005 Feb 1; 21(3):293-306.
[Bioinformatics. 2005]Curr Opin Chem Biol. 2003 Apr; 7(2):238-51.
[Curr Opin Chem Biol. 2003]Gene. 1997 May 20; 191(1):GC1-GC9.
[Gene. 1997]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Enzyme. 1987; 38(1-4):91-107.
[Enzyme. 1987]Clin Chim Acta. 1977 May 2; 76(3):321-8.
[Clin Chim Acta. 1977]Eur J Pediatr. 1994; 153(7 Suppl 1):S62-7.
[Eur J Pediatr. 1994]Herz. 2000 Mar; 25(2):117-25.
[Herz. 2000]Clin Chem. 2004 Aug; 50(8):1447-50.
[Clin Chem. 2004]Hum Mutat. 2003 Apr; 21(4):401-7.
[Hum Mutat. 2003]Curr Opin Chem Biol. 2003 Apr; 7(2):238-51.
[Curr Opin Chem Biol. 2003]J Mol Microbiol Biotechnol. 2003; 5(3):172-89.
[J Mol Microbiol Biotechnol. 2003]Vitam Horm. 2001; 61():157-71.
[Vitam Horm. 2001]Curr Opin Chem Biol. 2003 Apr; 7(2):238-51.
[Curr Opin Chem Biol. 2003]J Bacteriol. 2002 Aug; 184(16):4555-72.
[J Bacteriol. 2002]Mol Biol Evol. 2004 Jul; 21(7):1242-51.
[Mol Biol Evol. 2004]J Biol Chem. 2005 May 27; 280(21):20185-8.
[J Biol Chem. 2005]Mol Biol Evol. 2004 Jul; 21(7):1242-51.
[Mol Biol Evol. 2004]Trends Genet. 1997 Sep; 13(9):375-6.
[Trends Genet. 1997]Nucleic Acids Res. 2000 Jan 1; 28(1):27-30.
[Nucleic Acids Res. 2000]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D438-42.
[Nucleic Acids Res. 2004]Brief Bioinform. 2000 Nov; 1(4):357-71.
[Brief Bioinform. 2000]J Biol Chem. 2002 Dec 13; 277(50):48949-59.
[J Biol Chem. 2002]Genome Res. 2002 Oct; 12(10):1507-16.
[Genome Res. 2002]Curr Opin Chem Biol. 2003 Apr; 7(2):238-51.
[Curr Opin Chem Biol. 2003]Curr Opin Cell Biol. 2003 Apr; 15(2):191-8.
[Curr Opin Cell Biol. 2003]Brief Bioinform. 2000 Nov; 1(4):357-71.
[Brief Bioinform. 2000]J Biol Chem. 2002 Dec 13; 277(50):48949-59.
[J Biol Chem. 2002]Genome Res. 2002 Oct; 12(10):1507-16.
[Genome Res. 2002]Curr Opin Chem Biol. 2003 Apr; 7(2):238-51.
[Curr Opin Chem Biol. 2003]Curr Opin Cell Biol. 2003 Apr; 15(2):191-8.
[Curr Opin Cell Biol. 2003]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]