![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||
Copyright © 2008, American Society for Microbiology Cohesion Group Approach for Evolutionary Analysis of TyrA, a Protein Family with Wide-Ranging Substrate Specificities The Computation Institute, University of Chicago, Chicago, Illinois 60637,1 Mathematics and Computer Science, Argonne National Laboratory, Argonne, Illinois 60439,2 Bioscience Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545,3 Fellowship for Interpretation of Genomes, 15W155 81st Street, Burr Ridge, Illinois 60527,4 Emerson Hall, University of Florida, P.O. Box 14425, Gainesville, Florida 326045 *Corresponding author. Mailing address: Emerson Hall, University of Florida, P.O. Box 14425, Gainesville, FL 32604. Phone: (352) 475-3019. Fax: (352) 846-3631. E-mail: rjensen/at/ufl.edu This article has been cited by other articles in PMC.Abstract Summary: Many enzymes and other proteins are difficult subjects for bioinformatic analysis because they exhibit variant catalytic, structural, regulatory, and fusion mode features within a protein family whose sequences are not highly conserved. However, such features reflect dynamic and interesting scenarios of evolutionary importance. The value of experimental data obtained from individual organisms is instantly magnified to the extent that given features of the experimental organism can be projected upon related organisms. But how can one decide how far along the similarity scale it is reasonable to go before such inferences become doubtful? How can a credible picture of evolutionary events be deduced within the vertical trace of inheritance in combination with intervening events of lateral gene transfer (LGT)? We present a comprehensive analysis of a dehydrogenase protein family (TyrA) as a prototype example of how these goals can be accomplished through the use of cohesion group analysis. With this approach, the full collection of homologs is sorted into groups by a method that eliminates bias caused by an uneven representation of sequences from organisms whose phylogenetic spacing is not optimal. Each sufficiently populated cohesion group is phylogenetically coherent and defined by an overall congruence with a distinct section of the 16S rRNA gene tree. Exceptions that occasionally are found implicate a clearly defined LGT scenario whereby the recipient lineage is apparent and the donor lineage of the gene transferred is localized to those organisms that define the cohesion group. Systematic procedures to manage and organize otherwise overwhelming amounts of data are demonstrated. INTRODUCTION Gene products and the genes encoding them exhibit a wealth of alternative character states (see Table 1 for definitions). This diversity can be equated with a vast repertoire of biochemical and physiological individualities that define the ever-divergent tree of life. For the most intensively studied gene/gene product systems, experimental documentation exists for only a small fraction of the hundreds of finished genomes that are now available. Given the contemporary pace of genome sequencing, this fraction will become increasingly smaller. Any new experimental results with a given gene product in a given organism immediately become of greatly expanded interest to the extent to which the various character states found and described can be extrapolated to related organisms. But how far can one proceed along a scale of diminishing sequence resemblance before confidence in projections of a known character state (e.g., the specificity of a specificity-variable enzyme) to its closest relatives becomes uncertain? How can one achieve an integrated and credible picture of what evolutionary events proceeded within the vertical genealogical trace and what events intervened via lateral gene transfer (LGT)?
In this review, we focus upon a dehydrogenase that functions in l-tyrosine biosynthesis as a prototype example of numerous enzymes which are important to understand but which are at the same time “difficult” subjects for bioinformatic analysis due to moderate sequence length, moderate conservation of sequence, and variable catalytic properties (e.g., substrate specificity). We introduce the concept of cohesion group analysis, whereby the available collection of a given protein homolog is sorted into many separate groups of high identity. Each sufficiently populated cohesion group is phylogenetically coherent and defined by an overall congruence with a distinct section of a 16S rRNA tree. Evolutionary progressions can be rigorously ascertained within cohesion groups, and interesting LGT events can be recognized. Because evolution often proceeds in a circuitous fashion, can make “jumps,” and may even reverse course, the evolutionary path is most reliably traced in a continuum of closely related organisms as a beginning step. Cohesion groups are thus rigorous units for making bioinformatic and evolutionary inferences because they represent genealogical segments taken at relatively shallow hierarchical levels. Once the latter foundation is established, the scope of the analysis can be progressively enlarged because the continual availability of sequences from new genomes is expected to result not only in the formulation of new cohesion groups but also with the merging of cohesion groups as phylogenetic gaps are progressively filled. In addition, as exemplified by previous work with the seven proteins of tryptophan biosynthesis (78), concatenation of multiple proteins has been shown to be a next step that confers greatly expanded resolving power. The assembly of such “supercohesion groups,” which correspond to metabolic segments, is envisioned as an advanced step. The current TyrA assemblage consists of two subhomology groupings designated TyrAα (40 cohesion groups) and TyrAβ (18 cohesion groups). Evidence in support of the thesis that the TyrAβ subhomology grouping consists of TyrA enzymes that interact with either fused domains or complexed domains of other enzymes is presented. Multiple examples of the logic used to make evolutionary conclusions are given, and examples of tentative evolutionary scenarios that are experimentally testable are also given. Motif variations conserved within a cohesion group are discussed as reflections of probable mechanistic variations of an otherwise widely conserved mechanism. How a rationale can be developed to select key organisms that have ideal phylogenetic placements to advance an overall analysis by filling information gaps with experimental data is demonstrated. Systematic procedures to manage and organize otherwise overwhelming amounts of data are described. Web resources are introduced, which are interactive and freely available. A set of character state snapshots that are displayed on a sortable set of cohesion group trees using tools developed at the SEED (http://theseed.uchicago.edu/FIG/Html/tyrASubsystem.html). This includes a viewer link that displays the context of gene organization around tyrA genes within a cohesion group. The approaches herein applied should be easily applicable to other metabolic subsystems. TyrA AND l-TYROSINE BIOSYNTHESIS Enzyme Order Alternatives Dictate Substrate Specificity Patterns l-Tyrosine biosynthesis almost always deploys a member of the TyrA family, the subject of this review. The alternative flow routes that proceed from prephenate to l-tyrosine are shown in Fig. Fig.1.1
Among the enzymes of amino acid biosynthesis, those of the TyrA family have perhaps been the most widely surveyed in comparative enzymological studies. The TyrA protein family includes enzymes of varied specificities that have in common the catalysis of an oxidative, irreversible reaction in l-tyrosine biosynthesis in all three domains of life. The single known exception to this general physiological role within the homology family is 4-amino-prephenate dehydrogenase, a sparsely distributed enzyme involved in antibiotic synthesis in some species of Streptomyces (7, 81). The universal overall reaction (which includes the latter functional role) involves oxidative decarboxylation and aromatization of one of several possible cyclohexadienyl substrates in the presence of a pyridine nucleotide cosubstrate. Protein families such as the TyrA protein family that can accomplish related but different reactions under the umbrella of a common overall chemistry are herein referred to as pliant proteins. The final two reactions of l-tyrosine biosynthesis consist of an aminotransferase step and the TyrA-mediated dehydrogenase step, which follow from prephenate, an obligatory cyclohexadienyl precursor of l-tyrosine. However, these two steps can occur in either order, a phenomenon that accounts for two mutually exclusive intermediates that may intervene between prephenate and l-tyrosine. If prephenate is first transaminated, then l-arogenate (a cyclohexadienyl amino acid) (82) is generated; if prephenate first undergoes oxidative decarboxylation, then 4-hydroxyphenylpyruvate is generated. Hence, some dehydrogenases of tyrosine biosynthesis are specific for prephenate (prephenate dehydrogenase), whereas others are specific for l-arogenate (arogenate dehydrogenase). A third qualitative category of specificity is one where either of the cyclohexadienyl substrates can be accepted (dual-specificity cyclohexadienyl dehydrogenases). The latter category is probably the most widespread. Cyclohexadienyl dehydrogenases exhibit substantial quantitative variation in that the degree of preference for one substrate or the other varies through a wide range. The TyrA family of dehydrogenases also exhibits varied specificities for the pyridine nucleotide substrate that can be accepted. Thus, some are specific for NAD+, some are specific for NADP+, and some will utilize either cofactor (again varying through a wide continuum of preference for the cofactor). In the following assessment of substrate specificities, it should be noted that various technical pitfalls for working with crude extracts and partially purified enzyme preparations have been recognized over the years. Adequate controls are needed to ensure that prephenate is not contaminated with l-arogenate or prephenyllactate (83), that a phosphatase is not converting NADP+ to NAD+ to give a false-positive result for NADP+ reactivity, that an oxidase is not recycling a reduced cofactor product back to the oxidized form to give unduly low (or null) apparent activities, and that apparent prephenate dehydrogenase activity is not in fact due to the production of l-arogenate via prephenate aminotransferase. Functional complementation of a mutant deficient in a known prephenate-specific dehydrogenase is not proof that the heterologous donor gene specifies a prephenate-specific enzyme because prephenate, accumulated at abnormally high concentrations behind the block, can be anomalously transaminated in vivo to l-arogenate. Indeed, a tyrA mutant of Salmonella enterica serovar Typhimurium, widely used as a source of prephenate, is also the main source of l-arogenate for biochemical preparations (8). Some of these phenomena have been responsible for errors in older literature. Saccharomyces cerevisiae is an example of an organism that has sometimes been assumed to possess a prephenate-specific TyrA dehydrogenase, but we are not aware of rigorous enzymological data in support of this. Strict specificity for prephenate. Prephenate-specific dehydrogenases (TyrAp) are thus far limited to two experimental documentations. One is within a large clade of gram-positive bacteria related to Bacillus subtilis, where the most detailed enzymological characterization remains that described previously Champney and Jensen (17). Here, the specificity for prephenate is coupled with specificity for NAD+. The other set of experimental data are from Gluconobacter oxydans, Brevundimonas vesicularis, Brevundimonas diminuta, and species of Acetobacter (13; data not shown). This group couples specificity for prephenate with specificity for NADP+. (All of the latter organisms are also distinctive in the possession of two other character states: an arogenate-specific dehydratase for phenylalanine synthesis and a single 3-deoxy-d-arabino-heptulosonate synthase of a distinctive homology type [AroAII] [38]), which is sensitive to tryptophan-mediated feedback inhibition.) Unfortunately, genomes of species of Brevundimonas (previously named Pseudomonas) have yet to be sequenced. Caulobacter crescentus is inferred to have a prephenate/NADP+-specific dehydrogenase by virtue of its close relationship with Brevundimonas species within the family Caulobacteraceae as well as the motif similarity in the G-rich cofactor discriminator region (see Fig. Fig.4).4
Although TyrA from Escherichia coli is widely referred to as a prephenate dehydrogenase, it is properly designated a cyclohexadienyl dehydrogenase since it exhibits a poor but distinct ability to utilize l-arogenate as an alternative substrate (4, 5). Actually, most of the closely related sister enterics within the lower Gammaproteobacteria, although also exhibiting a clear preference for prephenate, have relatively more dehydrogenase activity with l-arogenate than does E. coli. (4). Broad specificity. An early wide-ranging enzymological survey revealed the ubiquity of dual-specificity cyclohexadienyl dehydrogenases (TyrAc) (13). The implication is that an uncertain mixture of both orders of reaction may be ongoing simultaneously in a single organism. Beyond the many subsequent characterizations of partially purified enzymes cited in the following references, detailed studies of purified cyclohexadienyl dehydrogenases include those cloned from Zymomonas mobilis (86), Erwinia herbicola (75), and Pseudomonas stutzeri (77). Strict specificity for l-arogenate. l-Arogenate-specific dehydrogenases (TyrAa), also fairly widespread in nature, have been purified and characterized from a cyanobacterium (Synechocystis sp.) (10) and from a higher plant (Arabidopsis thaliana) (64). All photosynthetic bacteria and photosynthetic eukaryotes studied thus far possess l-arogenate-specific, NADP+-specific dehydrogenases. This specificity combination is present in the enzymes from red algae and green algae (9) as well as from Euglena gracilis (14). Coryneform bacteria, other actinomycetes, and Nitrosomonas europeae exemplify bacteria whose possession of l-arogenate-specific dehydrogenases are well documented (see reference 67 and references therein). Although the Nitrosomonas enzyme provides yet another example where specificity for the l-arogenate/NADP+ couple exists, the l-arogenate-specific enzymes from coryneform bacteria will utilize either cofactor, whereas l-arogenate-specific enzymes from most actinomycetes (39, 40) other than coryneform bacteria exhibit NAD+ specificity. One plausible and interesting selective basis for the enzymatic utilization of l-arogenate and the avoidance of 4-hydroxyphenylpyruvate as an intermediate of l-tyrosine biosynthesis is to prevent cross-pathway complications in cases where 4-hydroxyphenylpyruvate has additional functional roles in metabolism that could lead to futile cycling. For example, the catabolism of l-tyrosine often deploys an initial transamination step that generates 4-hydroxyphenylpyruvate, which could wastefully enter the biosynthetic pathway. An additional example is when 4-hydroxyphenylpyruvate formed from l-tyrosine is utilized as a biosynthetic precursor of plastoquinone and vitamin E, as is uniquely typical of photosynthetic organisms. It is likely no accident that photosynthetic organisms typically utilize l-arogenate as an obligatory intermediate of l-tyrosine biosynthesis, thus avoiding the possibility that 4-hydroxyphenylpyruvate molecules that should be plastoquinone precursors would erroneously enter the l-tyrosine biosynthetic pathway (futile cycling). It is an intriguing example of metabolic plasticity that the latter coupling of biochemical pathways (l-arogenate for l-tyrosine biosynthesis and 4-hydroxyphenylpyruvate for plastoquinone/vitamin E biosynthesis) results in a novel situation where l-arogenate is a precursor of 4-hydroxyphenylpyruvate, with l-tyrosine serving as the intermediate. Thus, in this case, 4-hydroxyphenylpyruvate, rather than being an intermediate of tyrosine biosynthesis, is a following, posttyrosine intermediate of plastoquinone biosynthesis. Patterns of substrate specificity and regulatory interplay in Tyr/Phe branches. Organisms such as Bacillus subtilis that deploy a specific prephenate dehydratase and a specific prephenate dehydrogenase at the prephenate branchpoint (the classic pathway configuration) have a regulatory domain known as the ACT domain (49) attached to each of the competitively positioned enzymes to accomplish direct feedback inhibitions that are easily visualized. However, a less straightforward (albeit rather common) pattern for the biosynthesis of l-phenylalanine and l-tyrosine in nature is the utilization of l-arogenate for l-tyrosine synthesis but not for l-phenylalanine synthesis. This occurs in cyanobacteria (69), coryneform bacteria (24-26), and other actinomycetes such as Amycolatopsis methanolica (1). In fact, in the absence of early information that l-arogenate could be a precursor of phenylalanine, l-arogenate was initially named “pretyrosine” (69). With this pathway configuration (consult the figure at http://www.aropath.lanl.gov/Visualizations/TyrPath/TyrPath.htm), the tyrosine branch is unsuited for direct allosteric control. This is because at the branchpoint in this pathway configuration, the prephenate aminotransferase reaction is catalyzed by an aromatic aminotransferase, none of which have ever been found to be subject to allosteric control. It seems likely that catalytic interference caused by the structural overlap of the l-tyrosine end product with the substrates that can be accommodated by aromatic aminotransferases would account for this. On the other hand, the phenylalanine branch is well equipped for allosteric control (since prephenate dehydratase [PheA], which competes with prephenate aminotransferase at the prephenate branchpoint, catalyzes an irreversible initial step of substrate commitment). The ACT domains of cyanobacterial and coryneform PheA proteins mediate a novel mechanism of control to balance flux to both end products. PheA is subject to opposing influences of allosteric activation by l-tyrosine and allosteric feedback inhibition by l-phenylalanine. Starvation for l-phenylalanine enhances the flow of prephenate to l-phenylalanine due to an unrestrained PheA enzyme that is not only transiently free from feedback inhibition by l-phenylalanine but also activated by endogenous l-tyrosine. On the other hand, starvation for l-tyrosine results in the potent inhibition of PheA by endogenous l-phenylalanine, which relieves prephenate aminotransferase from competition with PheA at the branchpoint, thus enhancing flux toward tyrosine. In this manner, l-tyrosine synthesis is indirectly regulated by an enzyme of l-phenylalanine synthesis. It is intriguing that Pseudomonas aeruginosa exhibits a similar pattern whereby flux to l-phenylalanine is regulated directly and flux to l-tyrosine is regulated indirectly. Here, rather than deploying an arogenate dehydrogenase, a cyclohexadienyl dehydrogenase is used. Since the sole chorismate mutase for aromatic biosynthesis is fused to prephenate dehydratase, prephenate is channeled toward l-phenylalanine preferentially. Potent feedback inhibition of prephenate dehydratase by l-phenylalanine allows the release of prephenate from the complex and its utilization for l-tyrosine biosynthesis. This has been described as a channel-shuttle mechanism of regulation (15). With the background that TyrA proteins that are specific for prephenate are suitable for highly sensitive allosteric control and therefore likely to possess an allosteric domain such as the ACT domain, one might expect that all TyrA proteins that are fused with an ACT domain would be prephenate specific or at least exhibit an overwhelming preference for prephenate. However, TyrA from Streptomyces has an ACT domain but has been reported to be l-arogenate specific (39, 40). This is surprising because the implied inhibition of arogenate dehydrogenase by l-tyrosine could occur, albeit with less refinement, via direct product inhibition without an ACT domain. Moreover, the selective value of this inhibition, however implemented, is questionable because it would cause the accumulation of l-arogenate, which cannot enter the l-phenylalanine pathway directly, requiring back-transamination to prephenate first. One possible mechanism to explain the role of an ACT domain in keeping phenylalanine and tyrosine synthesis balanced would be for l-phenylalanine to activate arogenate dehydrogenase (via the ACT domain) in addition to inhibiting prephenate dehydratase. Another possibility is that Streptomyces might deploy an arogenate dehydratase instead of the much more ubiquitous prephenate dehydratase, thus placing l-arogenate at the metabolic branchpoint (an alternative pathway pattern). If so, backed-up l-arogenate caused by the inhibition of arogenate dehydratase and arogenate dehydrogenase by l-phenylalanine and l-tyrosine, respectively, may in turn feedback inhibit the initial common-pathway step of aromatic biosynthesis (in a pattern of sequential feedback inhibition similar to that discovered in higher plants) (21). This illustrates how an organized basis for desirable experimental inquiries can be driven by detailed analyses that are grounded in phylogenetic context, a point made recently by Osterman (58). Coexisting Pathway to l-Tyrosine in Some Anaerobic Organisms It should be noted that in some cases, a second interesting pathway of tyrosine biosynthesis coexists with the chorismate pathway. This second pathway can convert aryl acids to aromatic amino acids and is probably of limited distribution in anaerobes. It has been shown (63) that Methanococcus maripaludis illustrates the ability to scavenge environmental 4-hydroxyphenylacetate produced by the microbial community via peptide catabolism. Following activation to the coenzyme A thioester, reductive carboxylation, and transamination, the l-tyrosine product spares the use of the more expensive de novo pathway derived from chorismate. This aryl acid pathway is well integrated by regulation, such that it is the first-choice option, favored over the coexisting chorismate-derived pathway whenever 4-hydroxyphenylacetate is available. How Common Is Variation of Substrate Specificity? Enzymes are so well known for the truly remarkable specificities which often exist that an impression endures that broad-specificity enzymes are not common. However, aside from enzymes such as aminotransferases, which typically possess broadly overlapping substrate specificities (36), many enzymes also carry latent specificity potentials that can be enhanced under positive selective conditions (3, 52). Primordial enzymes with broad substrate specificity are central to the “recruitment hypothesis” (sometimes called the “patchwork hypothesis”) whereby differentially narrowed specificities and regulatory properties were attached to gene products of duplicated genes (34). These genes form paralog families, distinguished by differentially specialized functions but sharing a common catalytic mechanism and united by the ability to regain one or more of the related functions. In contemporary experimental systems, the latter expression of latent catalytic abilities is obtained by the selection of suppressor mutations. Two categories of substrate ambiguity exist: (i) those confined to operation within a pathway where the order of reaction steps can vary (same-pathway ambiguity) and (ii) those where an enzyme is competent for two or more alternative reactions that belong to different pathways (multipathway ambiguity). Same-pathway ambiguity. The TyrA family exemplifies same-pathway ambiguity. In most cases, the chemistry needed to build a given molecule dictates a particular order of steps that must be followed. In the case of l-tyrosine biosynthesis, modification of the side chain (via aromatic aminotransferase) and decarboxylation/aromatization (via dehydrogenase) are not interdependent. Thus, the overall conversion of prephenate to l-tyrosine can be accomplished with either order of steps. This is potentially true for any pathway where enzymatic chemistries performed are independent of one another. It would not be surprising if many such ambiguities exist but have not yet been recognized. For example, within the early common aromatic pathway, dehydroquinate proceeds to shikimate in two steps: dehydration (dehydroquinate dehydratase) and reduction (shikimate dehydrogenase). There is no reason a priori that these two steps could not occur in the opposite order, in which case quinate (rather than dehydroshikimate) would be the unique intermediate. Quinate dehydrogenase is widely known as a catabolic enzyme but potentially could perform as a biosynthetic enzyme in some systems. Multipathway ambiguity. A fuller modern appreciation of the extent of substrate ambiguity has been greatly accelerated by the contemporary surge in research designed to find and exploit substrate ambiguity for biotechnological objectives. It has become increasingly apparent with modern techniques of metabolite detection that the number of metabolites present in an organism far exceeds the number of genes that would be required if the gene product/enzymes were specific (66). Macchiarulo et al. (50) applied a sophisticated docking algorithm in a computational study that revealed a very high potential for cross-reactivity of endogenous metabolites and enzymes in metabolic reactions. There are two levels of enzymatic promiscuity. In addition to substrate ambiguity (34), it has become clear that surprisingly many enzymes can catalyze seemingly disparate reactions (catalytic promiscuity) that are normally classified as different types of reactions (55). Kurakin (46) made the case that both substrate ambiguity and catalytic promiscuity are in fact expected features in a new paradigm of dynamic and adaptive protein structure. In this paradigm, major and established biochemical pathways operate against a background where many diverse “micrometabolites” are fortuitously generated, a background thought to supply latent evolutionary potential. Even a minimal sampling of the very recent literature reveals a rapid proliferation of new examples. These include (i) a detailed assessment of the basis for the catalytic promiscuity of E. coli alkaline phosphatase, which can also act as a sulfatase (16); (ii) a new family of lactonases that hydrolyze a variety of lactones, possess low phosphotriesterase activities, and have been shown to be the source of a newly evolved and highly efficient phosphotriesterase (2); (iii) a gentisate dioxygenase that also functions with 1,4-dihydroxy-2-napthoate and salicylate (31); (iv) an ATP-dependent hexokinase from Sulfolobus tokadaii that can phosphorylate glucose, mannose, glucosamine, and N-acetylglucosamine (54); (v) a higher-plant isopropylmalate synthase that not only condenses acetyl coenzyme A (acetyl-CoA) with 2-ketoisovalerate but will also accept 2-oxo acid substrates of two-carbon to six-carbon lengths (19); (vi) a number of variations in the substrate specificities of glutathione synthesis enzymes in comparison to E. coli, Streptococcus agalactiae, and Clostridium acetobutylicum (42); (vii) an amino acid racemase from Pseudomonas putida with an unusual breadth of specificity for amino acids (43); (viii) ATP-forming acetyl-CoA synthetases that accept acetate, propionate, and some longer straight- and branched-chain acyl substrates (32); (ix) an isochorismate pyruvate lyase from Pseudomonas aeruginosa that also has weak chorismate mutase activity (45); and (x) Sulfolobus species that condense pyruvate and aldehydes with two to four carbon atoms (phosphorylated or not) (74). d-2-Hydroxyacid dehydrogenase from Haloferax mediterranei exhibits interesting parallels to the broad-specificity TyrA variants. This d-stereospecific enzyme has broad specificity for alpha-keto carboxylic acids and dual coenzyme specificity (NADH and NADPH) (20). This is striking because most members of this family are NADH dependent. A thorough and scholarly recent review on the subject of enzyme promiscuity was written by Khersonsky et al. (41). It should be noted that the above-described consideration of same-pathway and multipathway ambiguities is not all-comprehensive with respect to the large topic area of variations that occur in reaction/substrate/cofactor specificity, e.g., phosphorylation in alternative positions of some carbohydrates by the same enzyme and alternative positions of cleavage in the same peptide by protease, etc. The TyrA Supradomain The Structural Classification of Proteins (SCOP) database defines a protein domain as an evolutionary unit that can function independently or that can interact with other domains in a multidomain protein to achieve function. TyrA proteins exemplify a case where an N-terminal Rossmann fold and a C-terminal domain comprise a “supradomain” (72), a combination that is essential for catalysis mediated by TyrA. Sun et al. (71) noted that TyrA proteins belong to the “6-phosphogluconate dehydrogenase C-terminal domain-like superfamily” in the SCOP structural classification of protein domains. This superfamily has a ubiquitous N-terminal Rossmann fold joined to a C-terminal extension that is family specific. The latter extension has a common core that is formed around two long antiparallel helices. A supradomain of about 180 amino acids that is central to TyrA proteins has been identified (10, 77). All TyrA sequences used in this analysis have been trimmed to the boundaries of the supradomain and are available for download (http://theseed.uchicago.edu/FIG/tyra_sequence.cgi). Well-characterized TyrA proteins from Neisseria gonorrhoeae (70), Zymomonas mobilis (86), and Synechocystis sp. (10) as well as the engineered TyrA domain from Pseudomonas stutzeri (77) represent phylogenetically well-spaced proteins (cohesion groups 2, 9, 12, and 16) that exemplify the minimal domain length. It has been suggested (77) that the foregoing four sequences, although of different specificities, define a basic catalytic domain. In this model, it was proposed that the specificity for the side chains of the substrates utilized would parallel the specificity for side chains of inhibitors that are postulated to bind directly to the active site. The only difference between the prephenate and l-arogenate substrate molecules is the side chain, which remains unaltered in the coupled overall reactions of oxidative decarboxylation and aromatization (Fig. (Fig.1).1 Cohesion Groups Rigorous unit of analysis. Unlike 16S rRNA sequences, which have been famously used to obtain genomic phylogenies, protein sequences are of limited value for making phylogenetic inferences over wide phylogenetic distances, especially if the proteins are neither great in length nor highly conserved. Valid phylogenetic trees for proteins require an adequate continuum of close relatives. Indeed, where genome representation is sufficiently dense in subsections of the overall phylogenetic tree, protein trees can be more informative than 16S rRNA sequences because of the greater resolving power of amino acid variation (84). Xie et al. (80) assembled trees for the seven individual tryptophan pathway enzymes from then-available prokaryotes in a comprehensive analysis in which divergent paralogs and xenologs engaged in specialized metabolic activities were sorted out from the genes dedicated to primary biosynthesis. Examination of the distribution of gene fusions and gene organization patterns in a context where these distributions were mapped to the 16S rRNA tree elucidated a variety of lineage-specific evolutionary trends. Landmark evolutionary events of operon splitting and rejoining could be reconstructed by following individual divergences in narrow phylogenetic slices and placing these together in a broader phylogenetic context. With avoidance of errors due to ancient paralogy and LGT, one can deduce the most likely character state(s) that represents a given phylogenetic node. The hierarchical placement of each node is determined by the membership of a cohesion group. The more dynamic the evolutionary pace and therefore the greater the divergence, the more narrow (albeit more informationally enriched) the phylogenetic piece captured and therefore the more shallow the position of the node will be. If nodes at the bottom of the phylogenetic tree are sufficiently well represented to deduce any given character state(s) at those nodes, one can hope to apply parsimony principles to deduce the most likely common ancestor at progressively more ancient nodes, thus moving backwards in evolutionary time. It was shown (80) how contexts of flanking genes at relatively shallow hierarchical levels can illuminate which of two evolutionary states is ancestral and which is derived. Expansion via concatenation: supercohesion groups. The above-cited work was the basis for a follow-up effort in 2004 (78), which showed that a concatenation of the seven tryptophan pathway proteins yielded protein trees made up of individual sections that, while exhibiting an uncertain connectivity with one another, were each congruent with a portion of the 16S rRNA tree. Ten orphan concatenates were also obtained from genomes with no close relatives among the finished genomes. The seven single-protein tryptophan pathway trees were compared to the concatenate tree. They faintly resembled the concatenate tree but with much weaker support (depending upon highly individualistic degrees of conservation and protein length). Since the cohesion group approach is fundamental to the thrust of this review, some clarification of terminology is in order. Proteins whose sequences cluster together with high bootstrap values on a phylogenetic tree comprise a cohesion group. Most or all of these proteins are from organisms that also cluster together on a 16S rRNA tree, and this fraction of the cohesion group defines an evolutionary progression of the encoding gene in a vertical genealogy. Genes encoding one or more members of a cohesion group may have been transferred to phylogenetically distant organisms via LGT, and the protein thus will not fit 16S rRNA expectations. Such cohesion group members are called intruder sequences, and the genome possessing it is mosaic with respect to the encoding gene. Cohesion groups that are assembled by the concatenation of two or more proteins of a metabolic pathway are called supercohesion groups. A protein or concatenated protein that is too divergent to share membership in cohesion groups or supercohesion groups is called an orphan sequence and is the sole occupant of an orphan cohesion group or supercohesion group. Tryptophan pathway congruency groups within the Bacteria were so named because most or all members of a given group were congruent with 16S rRNA expectations. However, some congruency groups contain “intruder” sequences that, due to LGT, are not congruent with 16S rRNA expectations. To avoid semantic confusion, we herein rename these groups “cohesion groups,” since each group is a uniformly cohesive collection of sequences that all originated from a relatively recent ancestor. A given protein member of a cohesion group either is congruent with 16S rRNA expectations and therefore embedded within a vertical genealogy or is an intruder sequence that was translocated to an alien host organism via LGT. LGT of several whole-pathway trp operons and a few partial-pathway trp operons complicated but did not obscure the vertical genealogical trace (78). Indeed, the events of paralogy and xenology could be sorted out because of their demonstrated context within a discernible genealogical trace. The cohesion group approach with the tryptophan pathway subsystem facilitated new and very detailed evolutionary inferences that could be broadly applied to the kingdoms Bacteria and Archaea. In this paper, the cohesion group approach is extended to another branch (TyrA) of aromatic amino acid biosynthesis, with an ultimate objective of extending and integrating the knowledge base to the remainder of this large, multibranched pathway (and indeed with related metabolic subsystems). TyrA HOMOLOGY ISLANDS: AN ASSEMBLAGE OF COHESION GROUPS Multimember and Orphan Cohesion Groups A set of 347 trimmed catalytic core TyrA sequences from all three domains of life were aligned with manual adjustments as needed, particularly at the extreme N-terminal region, where alignment programs consistently yield poor results for the G-rich region that discriminates pyridine nucleotides. The refined alignment was used to obtain a phylogenetic tree. In order to eliminate biases caused by the overrepresentation of relatively large numbers of sequences from closely related organisms, nodes having bootstrap values in excess of a threshold value were collapsed (see below). A single arbitrarily chosen sequence was used to represent each cohesion group at a collapsed node, and these were then used to construct another alignment. Some cohesion “groups” contain a single sequence, and these unnumbered orphans are provisionally designated TyrCG-O. So far, all of the orphan sequences are from the Bacteria. The final alignment, which had received an input of the 18 orphan sequences plus a representative sequence from each of 40 multimember cohesion groups, produced a new tree in which each branch represents a cohesion group. The resulting bifurcated tree, shown in Fig. Fig.2,2
By design, the orphan sequences used each have as much impact on the alignment (and consequent tree) as do cohesion groups with large numbers of members. The TyrA cohesion groups can be considered to be generally coherent islands in phylogenetic space. As more sequences accumulate, new orphan sequences will emerge, some new sequences will group with previous orphans to yield a multimembered (and newly numbered) cohesion group, and some cohesion groups can be expected to merge as phylogenetic gaps are filled. Eventually, given a sufficient accumulation of new sequences to fill gaps in the phylogenetic space, merged cohesion groups can be expected to yield fewer TyrA cohesion groups that will capture larger phylogenetic slices at deeper hierarchical levels. A complete compilation of the current cohesion group membership (extended table) can be accessed at the SEED (http://theseed.uchicago.edu/FIG/Html/TyrAExtended.html). This is linked to the “Protein Page” at the SEED, which in turn is linked to many popular database resources, including the NCBI (see resources in the Appendix). The extended table is a key interactive resource that displays the source and certain properties of each TyrA sequence. Where it seems clear that a given sequence or group of sequences in a given cohesion group arrived in the host organism by LGT, they are labeled as “intruder sequences.” The taxon level of the organisms possessing the TyrA sequences in a given cohesion group (but excluding intruder sequences) is given in the leftmost column. Organisms with TyrA sequences deemed to be intruder sequences, if present, are listed at the bottom of a given cohesion group. Some cohesion groups are described as being “unresolved phylogenetic mixtures” because one or more of the members appear to be intruder sequences, but it cannot yet be deduced which is the intruder and which is not. Each entry is linked to the NCBI taxonomy browser, to the system used to apply organism acronyms, to the interactive Protein Page at the SEED, and to NCBI gene records. Certain other properties discussed in this review, such as gene fusions, are also tracked in the extended table. Xenolog Intruders Multimember cohesion groups are assemblages that are generally congruent with a vertical genealogy, although interesting xenolog intruder sequences were occasionally identified. For example, cohesion group TyrCG-1 contains 40 sequences from a sublineage of Gammaproteobacteria (lower Gammaproteobacteria) that cluster together as expected. Two additional member sequences from several strains of Nostoc (cyanobacteria) are also present as xenolog intruders (that is, a tyrA gene from within the enteric lineage was presumably passed to a common ancestor of Nostoc by LGT). These intruder sequences did not displace the native tyrA genes because Nostoc strains possess a second gene encoding a TyrA sequence which belongs to TyrCG-16, a large cohesive grouping of orthologs present in all 16 finished cyanobacterial genomes available. Thus, the Nostoc tyrA genes in TyrCG-16 are part of an ortholog collection that fits expectations of a vertical genealogy, whereas the Nostoc tyrA genes in TyrCG-1 are not congruent with 16S rRNA expectations (and hence are assumed to be xenolog intruders). The latter xenolog intruders are thought to play a specialized functional role in secondary metabolism (67), and indeed, it has recently been asserted that these genes participate in the provision of l-tyrosine precursor molecules dedicated to the formation of scytonemin, an indole-alkaloid that functions as a sunscreen agent (68). What is the rationale for the conclusion that the Nostoc genes in the above-described example arrived as intruder sequences rather than the opposite scenario, namely, that the genes from the lower Gammaproteobacteria are LGT intruders derived from Nostoc? Nostoc species are in the same taxon family as species of Anabaena, and Anabaena lacks the intruder sequences. Hence, if Nostoc were the LGT donor, the LGT would have occurred at a relatively recent time after its divergence from the genus Anabaena. In order to account for the possession of the LGT-derived gene by all of the lower Gammaproteobacteria, this fairly recent time would have had to overlap with the more ancient time when the common ancestor of lower Gammaproteobacteria existed, i.e., before divergence to various orders and after divergence from the upper Gammaproteobacteria. These times of Nostoc/Anabaena divergence and upper Gammaproteobacteria/lower Gammaproteobacteria divergence clearly do not overlap, as can be qualitatively assessed by inspection of the appropriate nodes of a 16S rRNA tree. At a hierarchical level of superorder for lower Gammaproteobacteria compared with a level of genus for Nostoc, the lower Gammaproteobacteria lineage is qualitatively older than the Nostoc lineage (even allowing for the uneven hierarchical taxon designations that exist). A gene from a younger lineage cannot have been passed to a common ancestor of the older lineage via LGT because that ancestor would have already diverged very substantially. In short, the common ancestor of lower Gammaproteobacteria could not have been an LGT donor to a Nostoc recipient because the more recent Nostoc lineage had not yet separated at the time when the common ancestor of lower Gammaproteobacteria emerged. Accordingly, it would be feasible for Nostoc to be an LGT donor to only some restricted divergent portion of the lower Gammaproteobacteria membership but not to all of it. TyrCG-13 is striking because it contains all of the current TyrA sequences from two taxonomic classes (Flavobacteria and Epsilonproteobacteria), each belonging to a different phylum. One set must be derived from a relatively ancient intruder sequence that was acquired from a member of the other set via LGT. The rationale for concluding that TyrA sequences in the class Flavobacteria arose as an intruder that arrived via LGT from an Epsilonproteobacteria source is explained later in this paper, where Fig. Fig.99
Those cohesion groups labeled in the extended table as an “unresolved phylogenetic mixture” contain one or more xenolog intruders, but it is unclear which one is the donor and which one is the recipient. For example, TyrCG-27 contains three sequences from three different phyla. Since the Anaeromyxobacter and Rhodopirellula organisms are from phyla that have representation in other cohesion groups, an educated (but still uncertain) guess would be that sequences from the latter two organisms are intruder sequences derived from within the phylum Verrucomicrobia. Acquisition of more sequences from appropriate organisms should clarify this. As a second example, TyrCG-25 contains TyrA sequences from two organisms in different phyla. Petrotoga miotherma is assumed to carry an intruder TyrA sequence derived from a relative of Dictyoglomus miotherma by LGT, and this is based upon the following line of logic. Petrotoga miotherma has a fairly close relative, Thermotoga maritima, whose TyrA sequence is an orphan. Their TyrA sequences would be expected to belong to the same cohesion group because the divergence of TyrA into multiple cohesion groups is usually not seen below the taxon rank of family. Thus, considering the relationship of TyrA sequences from Petrotoga, Thermotoga, and Dictyoglomus, a single LGT event of transfer of TyrA from within the Dictyoglomus lineage to Petrotoga would simultaneously explain why the TyrA sequences from Dictyoglomus and Petrotoga belong to the same cohesion group and why the TyrA sequences from Petrotoga and Thermotoga do not belong to the same cohesion group. Thus, with the information presently available, the former possibility is the most parsimonious inference. Nevertheless, a conservative approach is taken to still label TyrCG-15 as an “unresolved phylogenetic mixture” until the inference made above can be verified or denied with the help of more genome sampling. Intra-Cohesion-Group Intruders Even where a set of TyrA cohesion group members are congruent with a 16S rRNA tree, it must be clarified that one cannot assert an absolute absence of LGT events within the lineage. But such LGT events would have been between very close relatives, where LGT can indeed be expected to occur most frequently (47). For example, TyrCG-18 contains 27 sequences from the class Bacilli. As such, these sequences are all congruent with 16S rRNA expectations at the hierarchical level of the class taxon, and we identified no intruders in the current TyrCG-18 membership. However, it is possible, and even likely, that there may have been LGT exchanges within the cohesion group. LGT events at this level will usually not be noticeable, but given a sufficiently large and well-spaced membership, it should be possible to sort out LGT donors and recipients. Along these lines, it is instructive to revisit the phenomenon whereby the trp operon has been inserted into the middle of a six-member aromatic pathway (aro) operon concomitant with the gain of the regulatory gene mtrB, the loss of trpAb from the trp operon, and the subsequent conscription of pabAb to perform the amidotransferase function for both the tryptophan and p-aminobenzoic acid pathways (80). Note that this constitutes a suite of four different, but interwoven, character states. At the time of the previous study, the organisms known to have these character states were limited to Bacillus subtilis, Bacillus halodurans, and “Bacillus stearothermophilus.” Taxonomic revision has resulted in the placement of “B. stearothermophilus” into a different genus, Geobacillus (53). An additional Geobacillus genome, G. kaustophilus, as well as some additional Bacillus species are now available. The trp operon insertion and the associated character states can now be updated. They are all present in both of the Geobacillus species and in the following clade of Bacillus species: B. clausii, B. subtilis, B. halodurans, and B. licheniformis. Other Bacillus species (B. cereus, B. anthracis, and B. thuringiensis) lack the trp operon insertion and the three associated character states. Thus, in light of these updates, the simplest scenario is that the trp operon insertion into the aro operon, the loss of trpAb, the broadened functional role of pabAb, and the gain of mtrB regulation occurred initially as dynamic innovations in Geobacillus. Subsequently, the supraoperon was transferred via LGT to a common ancestor of the Bacillus clade and was positioned in the aro operon region by displacement via the recombination of flanking homolog genes. The transferred fragment could have been as long as mtrA>mtrB>hepS>menH>hepT>ndk>cheR>aroG>aroB> aroF>trp operon>hisHb>tyrA>aroF>tpr (the supraoperon is shown in boldface type), with recombination perhaps occurring between the mtrA and tpr orthologs (consult Fig. 11 in reference 80 for a view of this conserved gene region). Note that this would have cotransferred the unique trp regulatory gene mtrB, which encodes TRAP (trp RNA binding attenuation protein) (28). The assertion of an intra-cohesion-group LGT that is herein made is amenable to confirmatory follow-up in that protein trees for most or all of the proteins encoded by genes that flank the trp genes should give the same result as that obtained with the TyrA protein tree, namely, that the proteins of one set of Bacillus species are more similar to their counterparts in Geobacillus than to the remaining set of Bacillus species. If so, a significant evolutionary jump (sufficient to define a new trp cohesion group) has occurred in Geobacillus, and the suite of new character states have fairly recently been passed to a common ancestor of a fraction of the Bacillus genus via LGT. Genes flanking the trp operon may not have been much different in comparison of the donor and recipient of LGT. Accordingly, TyrA proteins from all Bacillus species populate the same cohesion group regardless of LGT from Geobacillus or not. Indeed, TyrA proteins from the entire class Bacilli populate a single cohesion group, except for the Symbiobacterium thermophilum orphan. In contrast, the tryptophan subsystem has experienced such dynamic evolutionary changes within Geobacillus that a new trp supercohesion group (based upon the concatenation of Trp proteins) has emerged. This multicharacter set of genes has then exerted quite a profound effect, via LGT, upon a clade of closely related species in a nearby genus. Since Geobacillus strains are comprised of thermophilic species, the above-mentioned proteins in that fraction of Bacillus species that have a Geobacillus origin might tend to have retained the characteristics of high thermotolerance of Geobacillus. This is experimentally testable. In the near future, when small cohesion groups expand to a better size for analysis, it should be possible to obtain fine-tuned protein trees that will allow inferences of credible LGT events within a given cohesion group. The availability of more genomes representing the genera Bacillus and Geobacillus in particular (as well as the class Bacilli in general) should allow this to be accomplished with the trp/aro multigene system. Correspondence of Cohesion Groups with Formal Taxon Ranks The “extended table” at the SEED supplies in the leftmost column the highest-ranking formal taxonomic designations (from the NCBI taxonomy browser) that bound a given cohesion group. Cohesion groups capture their membership at different hierarchical levels, e.g., TyrCG-7 at the level of family, TyrCG-14 at the level of order, TyrCG-17 at the level of subclass, TyrCG-18 at the level of class, and TyrCG-16 at the level of phylum. TyrA sequences from higher plants and fungi populate TyrCG-95 and TyrCG-98 at the hierarchical level of kingdom (but note that the Eukaryota are vastly more subdivided taxonomically than are the Bacteria). We often found that organisms belonging to a formal class contained two or more TyrA cohesion groups that did not match any formal hierarchical subdivisions of that class, such as subclass or order. Names have been provided for many of these subdivided taxons. For example, the Gammaproteobacteria (a formal class) are represented by 10 cohesion groups that carry the following name labels: lower Gammaproteobacteria (TyrCG-1), upper Gamma_1proteobacteria (TyrCG-2), upper Gamma_2proteobacteria (TyrCG-4), upper Gamma_3proteobacteria (TyrCG-5), upper Gamma_4proteobacteria (TyrCG-6), upper Gamma_5proteobacteria (TyrCG-7), and four orphans (Acidithiobacillus ferrooxidans, Methylococcus capsulatus, Microbulbifer degradans, and Nitrosococcus oceani). A striking list of many divergent character state features of aromatic amino acid biosynthesis points to two distinct subdivisions of the class Gammaproteobacteria. We have termed these the lower Gammaproteobacteria and the upper Gammaproteobacteria. With respect to the multiple character states of aromatic amino acid biosynthesis and regulation, all of the formal Gammaproteobacteria taxon orders (except one) partition cleanly into either the lower Gammaproteobacteria or the upper Gammaproteobacteria. Thus, we treat the Gammaproteobacteria as being comprised of two superorders: (i) the lower Gammaproteobacteria, containing the orders Enterobacterales, Pasteurellales, and Vibrionales and most families within the Alteromonadales, and (ii) the upper Gammaproteobacteria, containing the orders Chromatiales, Oceanospirillales, Pseudomonadales, and Xanthomonadales and part of the Alteromonadales (67). The latter so far consist only of genera within the family Alteromonadaceae, e.g., Marinobacter and Microbulbifer. The wide variation in the taxon rank delineated by the organisms whose TyrA sequences belong to a particular cohesion group can be attributed to (i) differing evolutionary dynamics in different lineages and (ii) uneven and erratic taxonomic subdivisions in formal nomenclature schemes (i.e., generously sampled and highly studied groupings become subject to more subdividing than do sparsely represented groupings). In general, it is predictable that TyrA sequences from organisms belonging to the same formal taxon up to the level of family will belong to the same cohesion group and will share similar character state properties. TWO TyrA SUBHOMOLOGY GROUPS The Master Cohesion Group Alignment TyrA is a single-homolog assemblage, but the TyrA tree bifurcates into two distinct groupings, labeled in Fig. Fig.22
The multiple alignment in Fig. Fig.33 Motif Variations Conserved at the Level of Cohesion Group Note that some near-invariant residues differ in an occasional cohesion group. Whenever a near-invariant residue differs in a particular cohesion group but nevertheless is conserved in all members of that cohesion group, such deviations are shown in boldface green type in Fig. Fig.3.3 Four Regional Sequence Sections That Differentiate TyrAα from TyrAβ Regions of sequence that clearly differentiate members of TyrAα from members of TyrAβ are indicated by numbers enclosed within diamonds (Fig. (Fig.3)3 COFACTOR DISCRIMINATOR REGION Specificity Motifs The pyridine nucleotide-binding domain of TyrA proteins extends to well over the sequence midpoint and abuts the second domain without any linker region (bent, divergent arrows indicate the join points of the two domains in Fig. Fig.3).3 Most of our curated TyrA sequences can be assigned to one of three specificity classes: specific for NAD+, specific for NADP+, or able to utilize either cofactor. Figure Figure33 Figure Figure44 NADP+-specific enzymes typically deploy one G/S/T/A residue at position 36, and this is followed most commonly by RS (but sometimes by RR or RK). A second pattern of NADP+ specificity (36G/A/S/TxxxRxR42) was recognized from the sequence from Gluconobacter oxydans, which is known experimentally to be NADP+ specific (and prephenate specific). The pattern from Fibrobacter succinogenes and Caulobacter crescentus matches this quite well. Here, the positively charged R residue, normally located at position 37, is shifted three positions downstream, and the R residue at position 42 may be significant as well. A broad capability to utilize either of the two cofactors is achieved by one of two variations: a 36GxxR39 motif and a 36N motif. The 36GxxR39 motif, as seen in some of the TyrA sequences from the Betaproteobacteria (70), resembles the 36G/S/T/AR37 motif of NADP+-specific enzymes. From an inspection of Fig. Fig.4,4 Some cohesion groups have a split membership with respect to cofactor specificity (see Fig. Fig.6,6
Cofactor Specificity Divergence in TyrCG-17 TyrCG-17 is a large cohesion group made up of the Actinobacteridae (subclass rank), whose experimentally studied membership so far possess l-arogenate-specific TyrA proteins. These have, however, diverged with respect to the cofactor-substrate utilized, being either NAD+ specific or broadly NAD(P)+ specific. This divergence of cofactor specificity correlates perfectly with a bifurcation of TyrA sequence clustering, which is evident in both the multiple alignment shown in Fig. Fig.5A5A
Figure Figure5A5A Members of TyrCG-17 are thought to all be l-arogenate specific, and it is perhaps surprising that the narrowing of cofactor specificity for the upper block of sequences in Fig. Fig.5A5A SNAPSHOTS OF TyrA CHARACTER STATES IN A PHYLOGENETIC CONTEXT A Tool To Track Character State Variations The comparative assessment of various character state features of TyrA proteins is potentially useful for a detailed bioinformatic analysis. The ability to track various features of interest that covary with one another can lead to important insights and to testable hypotheses. Considering the large number of genomes already sequenced, together with the proliferation of new genomes coming online, a systematic way to manage and access data that builds upon a basic store of careful and detailed study is needed. Otherwise, the volume of information is overwhelming. Some questions will be generally applicable to most metabolic subsystems. For example, what are the phylogenetic boundaries of the organisms that in common possess TyrA proteins belonging to a given cohesion group? Which events of LGT can be tracked through the identification of intruder sequences? What gene fusions are present in a given cohesion group (thus implying a common origin)? If gene fusion panels like panel 13 of Fig. Fig.66 Online at the SEED (http://theseed.uchicago.edu/FIG/Html/TyrAPanels.html), clicking the “compare TyrA panels” option allows a choice of up to three panels for side-by-side comparison. The individual panels are expandable with a built-in magnifier, and links are provided at the top for navigation to the extended table. Phylogenetic Boundaries Panel 1 of Fig. Fig.66 The phylum Proteobacteria exhibits relatively great overall divergences with respect to TyrA sequences such that cohesion groups usually parallel a formal order or a collection of orders. Only TyrA sequences from the Epsilonproteobacteria are represented at the class taxon level as members of a single cohesion group (Fig. (Fig.6,6 One member of TyrCG-6 (Marinobacter aquaeolei) as well as one orphan of the upper Gammaproteobacteria (Microbulbifer degradans, recently reclassified as Saccharophagus degradans) are classified at the NCBI as belonging to the Alteromonadales. However, TyrA members present in the Alteromonadales are otherwise housed by lower Gammaproteobacteria. M. aquaeolei and M. degradans clearly seem to have multiple properties characteristic of upper Gammaproteobacteria. They lack many evolved characteristics of lower Gammaproteobacteria. For example, a member of the latter superorder (exemplified by species of Shewanella within the Alteromonadales) possesses TyrAβ, an aroHI-tyrA fusion, a tyr operon containing a newly emerged paralog encoding a third regulatory isoenzyme of 3-deoxy-d-arabino-heptulosonate 7-phosphate (DAHP) synthase, a tyrR regulatory gene, and a complete trp operon including a trpD-trpC fusion. These are all newly evolved character states that typify lower Gammaproteobacteria (more detail can be found in Fig. Fig.77
Two other TyrA sequences from the upper Gammaproteobacteria are orphans, with one (Methylococcus capsulatus) being from the order Methylococcales and the other (Acidithiobacillus ferrooxidans) being from the order Acidithiobacillales. New TyrA sequences from incoming genomes belonging to these orders will very likely join the orphans, producing new cohesion groups. The distribution of cohesion groups in the various nonproteobacterial taxa are covered in panels 5 and 6 of Fig. Fig.6,6 Xenolog Intruders The presence of xenologs in some cohesion groups is portrayed in panel 9 of Fig. Fig.6.6 Substrate Specificities The distributions of the various specificities for the cofactor substrate are shown in panel 10 of Fig. Fig.6.6 The distributions of the three specificity patterns for the cyclohexadienyl substrate are shown in panel 11 of Fig. Fig.6.6 Gene Fusions The tyrA gene has been a popular fusion partner. Fusions of tyrA with various protein partners occur throughout the TyrAα and TyrAβ subhomology groupings, as displayed in Fig. Fig.6,6 Gene Context of tyrA tyrA is frequently adjacent to other aromatic pathway genes, often being within an operon or within a supraoperon. Panel 15 of Fig. Fig.66 Gene organization is not highly conserved and can be quite erratic, even within short phylogenetic distances (33). Even operons are surprisingly vulnerable to disruption, as documented in detail with the trp operon (80). However, functionally related genes frequently retain linkage relationships over at least short phylogenetic distances, sometimes with distinct shuffling patterns. The comparative analysis of gene clusters can be extremely informative, yielding valuable functional and evolutionary clues. Examples of how this approach can elucidate functional roles for “missing genes” have been reported (30, 59, 61). Each cohesion group section of the extended table has an arrowhead button after the cohesion group number, which allows navigation to a direct single-view comparison of the gene organizations surrounding tyrA within that cohesion group. These are extracted from all of the individual graphics that appear on the Protein Pages of each sequence at the SEED for which there is a current identification number. This accommodates a very convenient way to view the extent to which the gene organization is consistent within a cohesion group. Phylogenetic groupings at about the level of class often exhibit sufficient conservation of gene synteny that an ancestral gene organization can be deduced. Nevertheless, extensive gene shuffling occurs such that individual lineages will have highly scrambled (or even unrecognizable) versions of the consensus gene organization. The admixture in a given phylogeny of gene organizations conserved over relatively great phylogenetic distances (stability) in combination with dramatic gene shuffling over short phylogenetic distances (instability) is one of the intriguing mysteries of genomics. A detailed example of this was analyzed (67) in the upper Gammaproteobacteria and Betaproteobacteria, where a proposed ancestral supraoperon is gyrA>serC>aroQ-pheA>hisHb>tyrA>aroF>cmk>rpsA>himD. Only Ralstonia metallidurans in the Betaproteobacteria has a “perfect” ancestral supraoperon. Most of the other Betaproteobacteria exhibit very minor supraoperon alterations such as open reading frame insertions and single-gene deletions. Occasionally, more drastic gene shuffling (Chromobacterium violaceum) or partial supraoperon translocation (Nitrosomonas europaea) has occurred. At one extreme (species of Neisseria), the genes of the supraoperon have been completely dispersed. An entirely parallel situation is found in the upper Gammaproteobacteria, where most organisms house near-perfect ancestral supraoperons that differ only slightly in having gene insertions, gene deletions, or gene fusions. Pseudomonas aeruginosa, for example, possesses gyrA>serC>aroHI-pheA>hisHb>tyrA-aroF>cmk>rpsA>himD. Multiple fragmentation of the supraoperon has occurred elsewhere, e.g., in species of Xanthomonas and Xylella. It is quite striking that the supraoperon gene arrangement of R. metallidurans (Betaproteobacteria) is more similar to that of P. aeruginosa (upper Gammaproteobacteria) than to the supraoperon compositions of many other Betaproteobacteria. In reciprocal fashion, the P. aeruginosa supraoperon gene arrangement is more similar to that of R. metallidurans than to those of many other upper Gammaproteobacteria. The data described above illustrate that within a manageable phylogeny (cohesion group), a particular order of dynamic events of gene ordering can be deduced, yielding a likely ancestral gene order. Parallel analyses at nearby phylogenetic nodes with a roughly equivalent hierarchical level can then lead to a systematic deduction of the ancestral synteny that predated those deduced for the sister nodes. Data That Are Relevant to the Indel Hypothesis Panels 12 to 18 of Fig. Fig.66 Examples of the application of the snapshot tool are pursued in some detail in later sections of this review. ORGANISMS THAT CARRY MULTIPLE HOMOLOGS PapC, a Functionally Specialized Paralog Some dehydrogenases in the TyrA family utilize 4-amino-prephenate as a substrate in a reaction series that leads to 4-amino-phenylalanine and ultimately to the antibiotic chloramphenicol. The otherwise invariant residue at position 154 in Fig. Fig.33 Surprisingly, all remaining PapC paralogs (which have an asparagine residue at position 154) reside in a single cohesion group located in the TyrAβ assemblage (not shown in Fig. Fig.2).2 Intra-Cohesion-Group TyrA Paralogs Gene duplication is a frequent, ongoing process, with gene duplicates often being lost. Functionally redundant paralogs from a given organism that are present in the same cohesion group are of recent origin and likely exhibit little functional difference. Desulfuromonas acetoxidans provides one example of recent intra-cohesion-group paralogs (present in TyrCG-14). The only other example at present is the functionally differentiated PapC paralog of Streptomyces coelicolor, which occurs in TyrCG-17 with a TyrAa protein as discussed directly above. Extra-Cohesion-Group TyrA Paralogs Rhodospirillum rubrum and Silicibacter pomeroyi are finished genomes of the Alphaproteobacteria that each possess one TyrA species in cohesion group TyrCG-12 and one TyrA species in TyrCG-30. TyrCG-12 is a large group of sequences from Alphaproteobacteria that belong to the TyrAα subhomology group. TyrCG-30, on the other hand, belongs to the TyrAβ subhomology grouping and contains two TyrA sequences in addition to the paralogs from R. rubrum and S. pomeroyi. Maricaulis maris, a finished genome that also belongs to the Alphaproteobacteria, lacks a paralog member in TyrCG-12. Thus, M. maris is so far alone among the Alphaproteobacteria in its complete reliance upon a TyrAβ-specified dehydrogenase for tyrosine biosynthesis. The fourth member of TyrCG-30 is from Myxococcus xanthus (Deltaproteobacteria and an unfinished genome). The latter is provisionally labeled as an intruder sequence, although the alternative scenario, that the M. xanthus sequence is a native sequence from which the TyrAβ intruder sequences present in a few genera of Alphaproteobacteria originated, certainly cannot be ruled out. It is interesting that TyrA from M. xanthus is the only member of TyrCG-30 to have a fused chorismate mutase domain (tyrA-aroHI), distinctive from other chorismate mutase fusions because it is a C-terminal fusion. Regardless of whether M. xanthus was an LGT donor or recipient, the fusion must have occurred after the LGT event. Ortholog/Xenolog Combinations The above-described apparent extra-cohesion-group paralogs might be cases of ancient paralog divergence, but it is also possible that one apparent paralog is in fact a xenolog. However, we cannot be sure of the latter unless an LGT donor is identified. One clear example of an ortholog/xenolog combination is in two species of Nostoc where TyrA orthologs exist in TyrCG-16 (which contains TyrA orthologs from all cyanobacteria). In addition, the two species of Nostoc possess a xenolog intruder belonging to TyrCG-1. Hence, the LGT donor was a lower gammaproteobacterium. Interestingly, the Nostoc proteins have an N-terminal extension that appears to be a remnant of the fused chorismate mutase domain, which is present in all other members of TyrCG-1. SIGNIFICANCE OF THE TyrAα/TyrAβ SCHISM Panel 1 of Fig. Fig.66 Lateral Gene Transfer between Superkingdoms? All of the TyrA sequences from the superkingdoms Archaea and Eukaryota are located in the TyrAβ subhomology group, and most of the TyrA sequences from the superkingdom Bacteria are located in the TyrAα subhomology group. However, a scattered number of bacterial sequences also belong to the TyrAβ grouping. Among the Proteobacteria, the latter include all of the lower Gammaproteobacteria (TyrCG-1), TyrCG-4 from the upper Gammaproteobacteria, a small group of TyrAα sequences from the Alphaproteobacteria (TyrCG-30) (also containing one intruder sequence carried by a deltaproteobacterium), and TyrCG-15, which is populated by two sequences from the Deltaproteobacteria. No Betaproteobacteria or Epsilonproteobacteria that host proteins belonging to the TyrAβ subhomology grouping are currently known. The phylum Bacteroidetes is represented by TyrCG-24 and TyrCG-23 in the TyrAα and TyrAβ subhomology groups, respectively. The Alphaproteobacteria exhibit some novel variations. Most of them contribute to a 38-member cohesion group (TyrCG-12), which, along with an orphan sequence (Pelagibacter ubique), belong to the TyrAα subhomology group. Three Alphaproteobacteria have members that occupy the TyrAβ subhomology group (TyrCG-30). Two of the latter (Rhodospirillum rubrum and Silicibacter pomeroyi) also host paralogs among the above-mentioned group of 38, thus being the only organisms so far known to possess a TyrA member of each subhomology group. The third member of TyrCG-30 (Maricaulis maris) is the only alphaproteobacterium whose sole TyrA sequence belongs to TyrAβ. Could all of the bacterial sequences that fall into the TyrAβ subhomology group be explained as acquisitions from archaeal or eukaryotic donors via LGT? If so, multiple LGT events would have had to occur independently in different bacterial lineages since those Bacteria whose sequences belong to the TyrAβ subhomology grouping do not cluster together in a common lineage. None of the seven cohesion groups within the TyrAβ subhomology grouping that have bacterial membership contain a sequence of the Archaea or Eukaryota that would implicate an LGT donor. This, of course, is also true for the two bacterial orphan sequences present in the TyrAβ subhomology grouping. Since genomic sampling is still quite minimal in the Archaea, it is possible that the LGT donors are simply unknown. However, the probability of this is lessened considering that a donor has not materialized on nine different occasions. Does Membership within TyrAβ Reflect Protein-Protein Interactions? We believe that it is likely that the TyrAβ subhomology group contains TyrA proteins that exhibit functionally critical protein contacts with either fused proteins or partnered members of a complex. In contrast, members of TyrAα are postulated to function independently of any protein partners. In a previous paper (67), it was noted that some TyrA sequences, such as that from E. coli, possessed distinctive indel structuring (insertions and deletions) in alignments with what are here called TyrAα subhomology group members. The above-described types of sequences (herein named TyrAβ) were originally named TyrAc_Δ (cyclohexadienyl dehydrogenases that have indel structuring). The previous TyrAc_Δ designation is herein abandoned in favor of the current TyrAβ designation (one which does not imply any substrate specificity). This indel hypothesis is stimulated largely by experimental work with E. coli and some close relatives. Thus, TyrA from E. coli (and all other lower Gammaproteobacteria) is fused at the N terminus with chorismate mutase (AroHI). Chen et al. (18) demonstrated that neither chorismate mutase nor cyclohexadienyl dehydrogenase reactions of E. coli are fully competent when isolated from one another. Sun et al. (71) cited a variety of other documentation to suggest that the two fused domains are functionally dependent. There is the suggestive correlation that lower Gammaproteobacteria have the fusion and belong to TyrAβ, whereas the closely related upper Gammaproteobacteria lack the fusion and belong to TyrAα. Xanthomonas and Xylella species (TyrCG-4) are exceptions among the upper Gammaproteobacteria in that they belong to the TyrAβ subhomology grouping. However, these TyrA species exhibit another fusion pattern: a C-terminal fusion with ACT, a broadly distributed regulatory domain. The intruder TyrA sequences present in species of Nostoc which are derived from the lower Gammaproteobacteria lineage possess an N-terminal extension that appears to be a remnant of the fused chorismate mutase, otherwise found in TyrCG-1. Key catalytic residues needed for chorismate mutase activity have not been conserved. It is interesting to consider that the extension nevertheless persists in order to maintain the domain-domain interactions proposed for TyrAβ enzyme species. This would be worthwhile to test experimentally since one can potentially evaluate what regions are needed to support TyrA activity without complications related to chorismate mutase activity. In addition to fusions with AroHI and the ACT domain, other members of TyrAβ exhibit fusions with a domain called REG (67) or have sequence extensions that may be unknown regulatory domains. Thus, cohesion groups that fall within the TyrAβ subhomology grouping consist of sequences that have experienced a wide variety of different and independent indel events postulated to be associated with functional domain-domain interactions. This variety plus normal phylogenetic divergence explain the separation of cohesion groups within the TyrAβ subhomology grouping. However, at the broadest level, the cohesion group members of TyrAβ have converged because they have the indel disruption of highly conserved motifs that are shared by members of TyrAα in common. The indel hypothesis does not require that members of the TyrAα subhomology group lack TyrA fusions and that members of the TyrAβ subhomology group possess TyrA fusions, although this is certainly the trend (Fig. (Fig.6,6 Sequence convergence following the independent fusion of interacting domains in widely separated organisms was demonstrated (78) in a simpler case where only two interacting domains were involved. Xie et al. (78) showed that four different and large TrpAa (anthranilate synthase aminase) cohesion groups were populated by sequences from the Actinobacteridae, Cyanobacteria, upper Gammaproteobacteria/Betaproteobacteria, and Alphaproteobacteria, respectively. Four TrpAb (anthranilate synthase amidotransferase) cohesion groups were populated by sequences from exactly the same organisms. However, several organisms in each of the former taxa possessed TrpAa and TrpAb domains, which were fused to one another and which did not belong to the expected cohesion groups made up of free-standing TrpAa or TrpAb domains. In comparison with the four separated positions of free-standing TrpAa domains on a phylogenetic tree, the fused TrpAa- domains were all clustered together on a divergent branch of the tree. (The hyphen and its placement signify a fusion at the C terminus.) Similarly, in comparison with the positions of free-standing TrpAb domains on a phylogenetic tree, all of the fused -TrpAb domains were clustered together on one divergent branch of the tree. Evidence that TrpAa-TrpAb fusions have occurred independently as many as seven times and that the convergence observed for sequences from diverse taxa is the consequence of rigid constraints imposed for proper protein-protein interactions of these subunits was presented (76). Utility of Cohesion Group Snapshots In our system, any TyrA features deemed to be of interest are displayed by painting them on the cohesion group tree shown in Fig. Fig.2.2 Are Essential Extradomain Contacts Needed for TyrA Members of TyrAβ? It is postulated that members of the TyrAβ subhomology group have a supradomain core region that is functionally dependent upon supradomain contacts with either a fused protein or a complexed protein. This was experimentally demonstrated for E. coli (18) and is reasonably extrapolated to all members of TyrCG-1. There is an excellent and well-organized information background to select key, well-spaced cohesion group members to test experimentally whether isolated catalytic core regions of TyrAα members are catalytically competent, in contrast to isolated supradomain regions of TyrAβ members, which are predicted to require contacts with extra-TyrA protein domains. In the case of E. coli, the fused chorismate mutase (AroHI) has a reciprocal dependence upon the fused TyrA for normal function. This raises the question of whether fused chorismate mutases and free-standing chorismate mutases of the AroHI homology class would also exhibit a bifurcated divergence similar to the TyrAα/TyrAβ dichotomy. This is certainly worthy of further examination. Interesting Specificity Issues Streptomyces coelicolor possesses two paralogs of recent divergence but functionally differentiated: tyrA and papC genes. As discussed above, S. coelicolor PapC is widely divergent from all other PapC proteins, the latter of which collect together within a single TyrAβ cohesion group (not shown in the figures and tables of this review). S. coelicolor PapC is assumed to play a role in the synthesis of calcium-dependent antibiotic because of the position of its encoding gene in the middle of the large CDA (calcium-dependent antibiotic) gene cluster (65). PapC proteins are generally assumed to utilize 4-amino-prephenate as a substrate, thereby producing 4-amino-phenylpyruvate as product. It is quite possible, however, that a given PapC could utilize 4-amino-arogenate instead. This specificity could apply if the order of dehydrogenase and transaminase steps was reversed (as can occur in tyrosine biosynthesis). Since the S. coelicolor PapC paralog is of recent origin and occupies the same cohesion group as its TyrA paralog, one might predict that the specificities of the two might be the same for the side chain of the substrate. Hence, we propose an experimentally testable idea, namely, that since TyrA from S. coelicolor is specific for l-arogenate (alanyl side chain), it is likely that the PapC paralog is specific for 4-amino-arogenate (alanyl side chain). Expanding the Evolutionary Context across Subsystems Cohesion groups can be formulated for single proteins, as exemplified by TyrA and the seven proteins of l-tryptophan biosynthesis, and the result can produce a picture of what features evolved in what lineages at what times. An evaluation of what character states evolved “purely” within a vertical genealogy and what character states were obtained by LGT can be deciphered. From the time of any new LGT acquisition that can be pinpointed, a new vertical genealogy can be tracked. Thus far, concatenates of the seven tryptophan pathway enzymes have been used to define supercohesion groups. The supercohesion groups, of course, have much more resolving power than do individual cohesion groups. The pathway of aromatic amino acid biosynthesis consists of a common trunk of seven reactions and three amino acid branches (http://www.aropath.lanl.gov/Visualizations/index.html). This can be thought of as four manageable metabolic subsystems that can eventually morph into a single subsystem. Since chorismate mutase and aromatic aminotransferase activities overlap the phenylalanine and tyrosine branches in a very intimate way, it would be logical to join TyrA, PheA, and the various homolog types responsible for chorismate mutase and aromatic aminotransferase in a single study, i.e., making up a single metabolic subsystem. It should be possible to assemble concatenates as a source of supercohesion groups that would represent the steps proceeding from chorismate to both phenylalanine and tyrosine. Finally, inclusion of the seven common pathway enzymes that feed all of the divergent branches of aromatic amino acid biosynthesis in the analysis will yield an integrated picture of what the milestone events were in each of the four individual subsystems and how these events may have impacted the gestalt of the overall pathway. An initial sense of how the larger picture can build may be gotten from a section below, which compares TyrA cohesion groups with tryptophan supercohesion groups. We anticipate that the eventual ability to “paint” the locations of cohesion groups corresponding to many metabolic subsystems on 16S rRNA trees would be valuable for a multitude of purposes. CANDIDATE TyrA PROTEINS FOR X-RAY CRYSTAL STUDIES Challenge of Broad-Specificity Reactions Enzymes can range between those that are exquisitely demanding and precise in their catalytic requirements and those that accelerate reactions that can accommodate alternative substrates or cofactors. The former enzymes seem to be encoded by genes that are generally larger and more conserved than genes encoding broad-specificity reactions. Highly specific and highly conserved enzymes are exemplified by dehydroquinate synthase, the second enzyme of aromatic amino acid biosynthesis, or 5-enolpyruvylshikimate-3-phosphate synthase, the sixth enzyme of aromatic amino acid biosynthesis (12). The latter enzyme combines two phosphorylated substrates with a precise and intricate mechanism that cannot tolerate much deviation. The multistep mechanism of dehydroquinate synthase involves alcohol oxidation, phosphate β elimination, carbonyl reduction, ring opening, and intramolecular aldol condensation. In such cases, X-ray crystal studies with an enzyme from just a single organism can provide widely applicable information. On the other hand, enzymes catalyzing broad-specificity reactions may have the pliability to accept a range of related substrates, may readily mutate from a given specificity to a closely related one, or may readily mutate to a narrowed or broadened profile of substrate specificity. Even where specificity for a particular substrate is the same in different members of a broad-specificity enzyme family, the pliability to allow divergence to different active-site variations that still accomplish exactly the same reaction may exist. These aspects of enzymatic plasticity, albeit intriguing, mean that a relatively large number of coordinated crystal studies are required if one is to fully understand the complete array of important amino acid contacts that fall under the catalytic umbrella of pliant enzyme families such as TyrA. Thus, one challenge is that whereas amino acid motifs that correspond to important active-site residues can be conspicuously invariable for ultraspecific enzymes, motifs may be much more elusive in multiple alignments of broad-specificity enzyme sets. In the latter case, careful and comprehensive work might reveal a series of motifs, each conserved and typifying particular lineages that carry a set of TyrA proteins. The potential results of a comprehensive series of X-ray crystal studies with a small enzyme having substantial catalytic plasticity can reasonably be expected to contribute general insight into what is required to make accurate functional inferences for the very large number of such “difficult” enzymes. Informative Selections from TyrAα Subhomology Group Members The existing comprehensive sequence analysis, as exemplified in Fig. Fig.3,3 Key variables of interest are TyrA crystals bound with any substrate for which it has catalytic competence. Given that enzymes specific for cyclohexadienyl substrate and pyridine nucleotide cofactor are known to occur in all combinations, this alone generates a qualitative total of nine comparative possibilities. An enzyme such as that from Ralstonia solanacearum (TyrCG-7) has roughly equal capabilities with NAD+ and NADP+ as well as roughly equal capabilities with l-arogenate and prephenate. Hence, there are four protein-substrate combinations that can be analyzed from this single TyrA species, each of which should be informative in comparison with TyrA proteins that can be selected for the various appropriate narrow specificities. Another dimension of complexity is that many broad-specificity TyrA species have order-of-magnitude preferences for one substrate or for one cofactor. These quantitative differences must have discernible parallels at the molecular level that distinguish them from the absolutely specific TyrA proteins or from broad-specificity TyrA proteins that accept alternative substrates about equally well. Ideal TyrA candidates for initial crystal studies are those that have been well characterized, are produced from organisms with complete genomes, and have core supradomains that are uncomplicated by fused catalytic or regulatory domains. Examples of such organisms selected from the TyrAα subhomology grouping are Zymomonas mobilis (broad-specificity cyclohexadienyl dehydrogenase with a preference for l-arogenate) (NAD+ specific), Aquifex aeolicus (cyclohexadienyl dehydrogenase markedly favoring prephenate) (NAD+ specific), Rhodopseudomonas palustris (cyclohexadienyl dehydrogenase with a marked preference for prephenate) (NADP+ specific), Ralstonia eutropha (cyclohexadienyl dehydrogenase) {broad cofactor specificity [NAD(P)+]}, Neisseria gonorrhoeae (cyclohexadienyl dehydrogenase with marked preference for prephenate) (NAD+ specific), Nitrosomonas europaea (l-arogenate specific and NADP+ specific), Corynebacterium glutamicum (l-arogenate specific, with a marked preference for NADP+ over NAD+), Synechocystis sp. (l-arogenate-specific and NADP+ specific), Gluconobacter oxydans (prephenate specific and NADP+ specific), and Clostridium difficile (prephenate specific and NAD+ specific). Although many additional TyrA proteins from organisms whose genomes unfortunately are not yet sequenced have been well characterized, it seems likely that this will be largely ameliorated in the near future, considering the high and increasing rate of genome sequencing. Although a well-spaced phylogenetic selection of TyrA proteins is generally desirable, in some cases, it might also be worthwhile to select TyrA proteins from a single cohesion group that have variant properties of substrate selectivity. This can be comparable to the approach of selecting specificity mutants for comparison with the wild-type parent in order to carry out structural analysis. For example, the entire cyanobacterial phylum possesses a TyrA member belonging to a single cohesion group (TyrCG-16). An extensive enzymological comparison indicated that most, if not all, cyanobacterial TyrA enzymes can utilize l-arogenate and NADP+ as substrates (29). Although some are absolutely specific for these two substrates, cyanobacteria frequently express broad-specificity enzymes that are capable of utilizing NAD+ (albeit always less well than NADP+). Less commonly, broad specificity for the cyclohexadienyl substrate exists, although l-arogenate is always utilized better than prephenate. (At one extreme, Synechocystis sp. strain PCC7509 uses prephenate 48% as well as l-arogenate at substrate saturation.) A second example that offers interesting comparative possibilities is the collection of TyrA proteins from the Betaproteobacteria. All members of TyrCG-7, TyrCG-8, and TyrCG-10 and four orphans (Table 2) are broad-specificity cyclohexadienyl dehydrogenases that have the broad cofactor specificity motif pattern 36GxxRS40 (Fig. (Fig.4).4 Informative Selections from TyrAβ Subhomology Group Members The basis for the supposition that TyrA proteins that belong to the TyrAβ subhomology grouping are ones that exhibit functional interactions with attached catalytic or regulatory domains (or perhaps which do so via protein-protein complexes) is discussed above. Compared to the TyrAα subhomology group, relatively few TyrA enzymes from the TyrAβ subhomology group have been characterized. Of course, TyrAc from E. coli is an obvious selection choice because of the abundance of experimental work with it, including evidence upon which the indel hypothesis is based (see references 11 and 71 and references therein). TyrAc from E. coli and TyrAc from Aquifex aeolicus should be a good comparative match as selections taken from the TyrAβ and the TyrAα subhomology groups, respectively. Each of these is NAD+ specific, and each is a cyclohexadienyl dehydrogenase that has a marked preference for prephenate as a substrate. Each is sensitive to l-tyrosine inhibition. Xanthomonas campestris and other members of TyrCG-4 are upper Gammaproteobacteria that possess a TyrA enzyme with a C-terminal ACT domain, with the latter perhaps being responsible for placement in the TyrAβ subhomology grouping. (Note that the presence of an attached ACT domain does not necessarily mean that a so-endowed TyrA species will be in the TyrAβ subhomology grouping since many gram-positive bacteria in the TyrAα subhomology grouping, e.g., all members of TyrCG-18, have an ACT domain.) In contrast to the members of TyrCG-4, all upper Gammaproteobacteria (TyrCG-2, TyrCG-3, TyrCG-5, TyrCG-6, and five orphans) lack an attached ACT domain and belong to the TyrAα subhomology grouping. X. campestris TyrA has been characterized as being NAD+ specific and broadly specific for cyclohexadienyl substrate. The best match for this substrate profile among the upper Gammaproteobacteria in the TyrAα subhomology grouping would be TyrAc produced by any of three orphans: Acidithiobacillus ferrooxidans, Methylococcus capsulatus, or Nitrosococcus oceani. The TyrA protein from Coxiella burnetii might also be worth considering for comparison. Like the X. campestris protein, it belongs to the TyrAβ subhomology grouping, but it lacks an ACT domain. This TyrA species is NAD specific, but its cyclohexadienyl specificity is uncertain. Also, we cannot be sure that this TyrA enzyme is a native upper Gammaproteobacteria protein since it resides in TyrCG-26, which is an unresolved phylogenetic mixture. Finally, TyrA proteins from higher plants (TyrCG-95) are well characterized as being l-arogenate-specific and NADP+-specific enzymes. Since the Synechocystis sp. strain PCC6803 enzyme (TyrAα subhomology group) has the same specificity profile as TyrA from organisms such as Arabidopsis thaliana (TyrAβ subhomology group), X-ray crystal comparative studies should be illuminating. Inhibition Properties: Insight into Binding of the 1-Carboxy Moiety? For the simplest TyrA proteins where allosteric domains or interacting catalytic domains are not attached, it has been proposed (77) that the product inhibitors (either l-tyrosine or 4-hydroxyphenylpyruvate) act directly at the active site as classical competitive inhibitors. Thus, there are cases where an enzyme is specific for prephenate (having a pyruvyl side chain) and is inhibited by 4-hydroxyphenylpyruvate (also having a pyruvyl side chain) but is not inhibited by l-tyrosine (alanyl side chain). On the other hand, enzymes such as those from higher-plant plastids that are specific for l-arogenate (alanyl side chain) and are inhibited by l-tyrosine (alanyl side chain) but not by 4-hydroxyphenylpyuvate (pyruvyl side chain) are known. For simple TyrA enzymes that lack discrete allosteric domains or interacting fusions, it generality seems to hold that the specificity of this core supradomain for the side chain of any substrate accepted (i.e., pyruvyl and/or alanyl) will parallel the specificity of product inhibition. Neisseria gonorrhoeae possesses a cyclohexadienyl dehydrogenase that prefers prephenate markedly over l-arogenate as a substrate. Accordingly, inhibition by 4-hydroxyphenylpyruvate is potent, and inhibition by l-tyrosine is weak. Thus, whenever inhibition has been observed, the side chain specificities of inhibitor and substrate parallel one another. However, some TyrA proteins are completely insensitive to competitive inhibition by the product. Thus, Acidovorax facilis and Rubrivivax gelatinosus possess TyrAc enzymes that are not sensitive to inhibition by either l-tyrosine or 4-hydroxyphenylpyruvate, Zymonomas mobilis TyrAc is not sensitive to inhibition by either l-tyrosine or 4-hydroxyphenylpyruvate, and Nitrosomonas europaea TyrAa is not sensitive to inhibition by the l-tyrosine product. Presumably, the latter TyrA species require the ring carboxylate for binding, whereas TyrA species that are sensitive to product inhibition must not require the ring carboxylate for binding. Comparison of reasonably close sets of TyrA proteins that differ in being resistant or sensitive to product inhibition could give insight into residue contacts that are important for binding of the ring carboxylate. For example, a reasonable choice for comparison might be two TyrA members of the Betaproteobacteria. TyrAc enzymes from Acidovorax facilis (TyrCG-10) and Burkholderia cepacia (TyrCG-7) are very similar in having broad specificities for the two cyclohexadienyl substrates and broad specificities for cofactor. The alternative substrates and alternative cofactors are accepted about equally well. However, the A. facilis enzyme is totally refractive to product inhibition, whereas the B. cepacia enzyme is sensitive to product inhibition. Sun et al. (71) pointed out that a glycine-rich region, 273-GGG-275, immediately preceding the 277-RxxxR-284 motif of Aquifex TyrA, seems to play a critical role in positioning 278-D′ into the active site within interacting distance of the ring carboxylate of prephenate (numbering as given in Fig. Fig.3).3 TyrAc from Aquifex aeolicus, one of the two TyrA proteins for which X-ray crystal studies exist (71), has a marked preference for prephenate and is NAD+ specific. Since it is quite sensitive to tyrosine inhibition (11), one would expect even greater sensitivity to inhibition by 4-hydroxyphenylpyruvate, but this was not tested. This TyrA sequence is currently an orphan sequence, so comparisons with relatively close orthologs are not yet possible. The second subject of an X-ray crystal study is Synechocystis sp. (48). This l-arogenate-specific, NADP+-specific enzyme was reported to be insensitive to inhibition by l-tyrosine. Unfortunately, this is at odds with a report by Bonner et al. (10), who detailed good sensitivity of TyrAa from the same strain to competitive inhibition by l-tyrosine. Enzymes that become selectively desensitized to inhibition while maintaining catalytic competence are known, but these usually are enzymes that have a distinct allosteric domain (or subunit). Legrand et al. (48) suggested that the difference might be due to “mutations” in four amino acids very near the C terminus. However, this apparent difference in sequence was due to an inadvertent transposition of a glutamine residue in the preparation of Fig. Fig.77 Selections Based upon Other TyrA Features Thus far, in this review, a total of 15 organisms have been suggested as examples that could be selected for comparative studies from a perspective of (i) interest in the nature of variable specificities for cyclohexadienyl substrate or cofactor reactant, (ii) gaining insight into the distinct difference between the alpha and beta subhomology groupings, or (iii) elucidating what dictates whether the 1-carboxy moiety is required for binding and whether this determines sensitivity to product inhibition directly at the active site. Still other features deemed to have significance could be used as criteria of significance with respect to organisms selected as a source of TyrA protein. These features would not necessarily be independent of some of the above-described considerations. For example, the motif RxxxR has been discussed above as a character state that has been suggested in the X-ray crystal study described Sun et al. (71) to be important in the mechanism employed by the TyrA protein of Aquifex aeolicus. The idea has been presented that in proteins belonging to the TyrAβ subhomology family (10, 71), this motif has been disrupted by extra-TyrA contacts extended from an attached or complexed domain. This is consistent with the near-total conservation of this motif throughout the TyrAα subhomology grouping and with its near-total absence in proteins belonging to the TyrAβ subhomology grouping. Thus, this motif seems intimately relevant to the second perspective described above. Scrolling through the extended table online shows that exceptions in the TyrAα subhomology grouping whereby the motif is disrupted include one member of TyrCG-5, some members of TyrCG-16, two members of TyrCG-11, the Flavobacteria component of TyrCG-13, half the members of TyrCG-16, most members of TyrCG-24, and one of the two members of TyrCG-31. Comparison of a motif-present member with a motif-absent member in the latter cohesion groups might be of particular value because the motif difference seen in each pair exists in a background of close phylogeny. The X-ray crystal study of TyrA from Aquifex aeolicus (71) indicated that the RxxxR motif comprises part of an ionic network, which was proposed to support a gated mechanism for the access of substrate to the active site. However, the X-ray crystal study of TyrA from Synechocystis sp. (48) asserted that this patch of basic residues does not seem to play a critical role in the binding of substrate. Synechocystis sp. belongs to TyrCG-16, a cohesion group that contains a total of 16 cyanobacteria. Although the subject of the X-ray crystal study has the motif, it is absent in 10 members of TyrCG-16. This suggests the possibility that the presence of the motif in some cyanobacteria may be only coincidental, and it may not have the functional significance that generally applies in the TyrAα subhomology grouping. It was also noted (71) that the rightward R residue of the motif (R284 in Fig. Fig.3)3 The Snapshot Tool for Facilitating Selection Choices for Comparative Analysis In this section, the further consideration of the RxxxR motif is pursued as an example of how the snapshot tool can be implemented to make rational choices for protein selection. Panel 16 of Fig. Fig.66 Example 1. Suppose that one chooses to think about the Gammaproteobacteria (a taxon at the level of class) in terms of how it has diverged into cohesion groups, where these cohesion groups belong in terms of the two primary subhomology groupings, and what the distribution pattern is for the RxxxR motif. If panels 2 and 16 of Fig. Fig.66 Example 2. Suppose panel 3 of Fig. Fig.66 The Alphaproteobacteria mostly populate TyrCG-12 in the TyrAα subhomology grouping, where they consistently possess the RxxxR motif. One can see just by considering TyrCG-12 alone that the significance of this motif has some broader meaning than a relationship to substrate/cofactor specificity in view of the widely different specificities previously described for organisms such as Zymomonas mobilis, Rhodopseudomonas palustris, and Gluconobacter oxydans, all members of TyrCG-12. In spite of its overall sequence divergence from most other Alphaproteobacteria, the TyrAα orphan Pelagibacter ubique also possesses the RxxxR motif. The three members of TyrCG-30 are the only Alphaproteobacteria present in the TyrAβ subhomology group, and all of them lack the motif. Thus, at the taxon level of class, TyrA proteins from the Alphaproteobacteria have diverged to form one orphan, one small cohesion group, and one large cohesion group. Only the small cohesion group belongs to the TyrAβ subhomology grouping, and this correlates perfectly with the lack of the RxxxR motif. Both Rhodospirillum rubrum and Silicibacter pomeroyi possess paralog members of TyrCG-12 and TyrCG-30, so a comparison of one of these two paralog pairs should also be rewarding. TyrA proteins from the Deltaproteobacteria populate three cohesion groups. Most of them are in TyrCG-14, which occupies the TyrAα subhomology grouping, and these have the RxxxR motif and are the only Deltaproteobacteria that are NAD+ specific. Members of TyrCG-15 and an orphan from Syntrophobacter fumaroxidans belong to the TyrAβ subhomology grouping and lack the motif, as expected. Selection of one TyrA from each of the two subhomology groupings yields a pair where the TyrAβ subhomology grouping member possesses a core supradomain length that is shortened (Fig. (Fig.6,6 Experimental Truncation of Fused Domains We suggest that the catalytic function of TyrA for members of the TyrAα grouping is not dependent upon an attached domain even if such fusions are present. It is predicted that the removal of such attached domains will not directly affect the catalytic reaction. This has in fact been shown for the tyrAc-aroF fusion of Pseudomonas stutzeri, where removal of the C-terminal AroF catalytic domain had no effect upon the remaining TyrA domain (77). In addition, TyrCG-7 contains 11 members, only three of which possess a tyrA-aroF fusion. This recent fusion has not distanced the TyrA domain of the small clade of Burkholderia species that contain it from the unfused TyrA domains of the sister Burkholderia species and species of Ralstonia that occupy the cohesion group. If new indel contacts had developed in the newly evolved TyrA-AroF protein to create interdependent domains, one would expect these TyrA domains to have diverged away from the unfused TyrA domains in TyrCG-7. TyrA proteins frequently possess a C-terminal ACT domain, as exemplified by the well-studied Bacillus subtilis enzyme (17), which belongs to the TyrAα subhomology grouping. It would be quite interesting to examine this enzyme following the removal of the ACT domain, which is an allosteric domain. This amino acid binding domain presumably accounts for the sensitivity of B. subtilis TyrAp to inhibition by l-tyrosine, l-phenylalanine, l-tryptophan, and d-tyrosine. Removal of the ACT domain should abolish these amino acid sensitivities, leaving only the sensitivity to inhibition by 4-hydroxyphenylpyruvate intact. This expectation is enhanced by the fact that exactly these properties were obtained with the selection of a d-tyrosine-resistant, tyrosine-excreting mutant in 1970 (17). Similar opportunities for examining the effects of removing a C-terminal ACT domain exist in other cohesion groups belonging to the TyrAα subhomology grouping, e.g., TyrCG-20, TyrCG-21, and TyrCG-22. In contrast with the above-described expectations for TyrA proteins belonging to the TyrAα subhomology grouping, experimental truncations that remove attached catalytic or regulatory domains of TyrA proteins belonging to the TyrAβ subhomology grouping are expected to impact TyrA catalysis directly. This has already been demonstrated following removal of the N-terminal chorismate mutase domain from E. coli aroHI-tyrAc (18), and X-ray crystal results that demonstrate the projected domain-domain contacts projected by Bonvin et al. (11) would be most welcome. Xanthomonas campestris and other members of TyrCG-4 possess a C-terminal ACT domain, just like B. subtilis and other members of TyrCG-18. Since the former and latter represent the TyrAβ and TyrAα subhomology groupings, respectively, the differences in how this allosteric domain interacts should be fascinating. Another attached regulatory domain of potential interest is the C-terminal REG domain present in members of TyrCG-80 (Euryarchaea_1). COMPARISON OF TYROSINE AND TRYPTOPHAN PATHWAY COHESION GROUPS Background Concatenated sequences of the seven tryptophan pathway enzymes that specifically participate in primary biosynthesis were previously assembled and used to construct trees. This produced seven supercohesion groups and 11 unnumbered orphans (78). The compositions of these multimembered and orphan tryptophan supercohesion groups obtained from 47 organisms are compared with the TyrA cohesion groups present in the same organisms (see below). Tyrosine pathway cohesion groups and tryptophan pathway supercohesion groups cannot be expected to correspond with one another perfectly for the following reasons. First, intruder sequences that become established in a given organism for one pathway will not generally be present for another pathway. Second, the sequence length and degree of conservation of the protein(s) upon which cohesion groups are based will dictate different relative resolving powers. Because the Trp enzyme concatenate trees are more robust than the single-enzyme TyrA trees, it is expected that some Trp supercohesion groups would correspond to multiple TyrA cohesion groups. Finally, aside from the differential resolving powers of the particular proteins used to make trees, dynamic evolutionary changes that sometimes occur in a short time frame (evolutionary jumps) drive accelerated divergence that leads to separated cohesion groups or supercohesion groups. Thus, for example, TrpSCG-6 contains concatenates from Bacillus subtilis, B. stearothermophilus, and B. halodurans that are clearly separated from concatenates from other Bacillus species and from certain sister firmicute species (Lactococcus/Listeria/Staphylococcus/Streptococcus) that populate TrpSCG-7 (80). Dynamic and recent evolutionary events in the smaller clade that have driven rapid divergence are the insertion of the trp operon into a six-gene aro operon; the loss of a gene encoding a histidine pathway aminotransferase from the histidine operon, forcing an aromatic aminotransferase in the aro operon to take on a dual function; and the loss of trpAb from the trp operon, forcing pabAb to assume a dual function. In contrast, TyrCG-18 is a large cohesion group that contains TyrA members from all of the organisms corresponding to TrpSCG-6 and TrpSCG-7. Thus, on the one hand, the B. subtilis/B. halodurans/B. stearothermophilus trio has experienced an evolutionary jump that led to a dramatic divergence with respect to the tryptophan pathway (see “Intra-Cohesion-Group Intruders” above for a proposed scenario for this evolutionary jump). On the other hand, only a shallow, graded divergence occurred for TyrA throughout this large clade of firmicutes, with the result that TyrA from the B. subtilis/B. halodurans/B. stearothermophilus trio occupies a common cohesion group with TyrA proteins from Bacillus, Listeria, Staphylococcus, Streptococcus, and Lactococcus. In previous studies of the tryptophan pathway (78, 80), a substantial fraction of the genomes and the corresponding taxonomic representation were absent compared to the much greater abundance of genomes available for the TyrA cohesion group study. Thus, in the following sections, discussion is limited to those TyrA cohesion groups existing in organisms where Trp supercohesion groups were also studied. Lower Gammaproteobacteria Lower Gammaproteobacteria (67) refers to a lineage within the Gammaproteobacteria that we consider to be the equivalent of a superorder. Its membership is drawn from the orders Enterobacterales, Vibrionales, and Pasteurellales and most of the Alteromonadales. Except for intruder sequences, TrpSCG-1 and TyrCG-1 possess sequences from exactly the same phylogenetic grouping, namely, the lower Gammaproteobacteria. Both the pathway of l-tryptophan biosynthesis and the TyrA subsystem can be considered to have experienced evolutionary jumps that, together with other features of aromatic amino acid biosynthesis, have separated the lower Gammaproteobacteria from the upper Gammaproteobacteria. The suite of evolutionary events relevant to l-tryptophan biosynthesis is discussed above. The evolutionary jump for TyrA is presumably tied to the gene fusion event with the gene encoding chorismate mutase. TrpSCG-1 contains whole-operon intruders that reside in contemporary Helicobacter pylori and in coryneform bacteria. TyrCG-1 contains intruder sequences that reside in species of Nostoc (a lineage within cyanobacteria). The trp operon LGT events resulted in a total displacement of the native trp genes, but the functional role of performing l-tryptophan biosynthesis remained exactly the same. In contrast, the tyrA intruders in Nostoc did not displace the native orthologs and are thought to exercise another functional role in secondary metabolism (68). Each of the three LGT events was relatively recent, since the intruder sequences in H. pylori are absent from other Epsilonproteobacteria, those present in coryneform bacteria are absent from other actinomycete bacteria, and those present in Nostoc are absent from other cyanobacteria. Upper Gammaproteobacteria and Betaproteobacteria We consider the upper Gammaproteobacteria to be the equivalent of a second superorder within the class Gammaproteobacteria. Their TyrA membership is drawn sparsely from the order Alteromonadales as well as from the remaining orders not listed above for the lower Gammaproteobacteria. TrpSCG-2 contained concatenate sequences from not only the upper Gammaproteobacteria but also the Betaproteobacteria. In contrast, TyrA sequences from the upper Gammaproteobacteria have been placed into 10 cohesion groups (visualized in green on Fig. Fig.6,6 Whereas TrpSCG-2 contains two cases of partial-pathway operon LGT, no intruders have so far been found to be present in any of the 20 TyrA cohesion groups that populate the upper Gammaproteobacteria and Betaproteobacteria (although, as mentioned above, the TyrA protein from C. burnetii in TyrCG-26 could possibly be a xenolog intruder). Alphaproteobacteria TrpSCG-3 contained Trp concatenates from the three Alphaproteobacteria genomes available at the time. TyrCG-12 presently contains TyrA sequences from the same organisms plus from an additional 34 organisms. Only an orphan (Pelagibacter ubique), a member of TyrCG-26 (unresolved phylogenetic mixture), and the small membership of TyrCG30 possess TyrA sequences that do not belong to TyrCG-12 (Fig. (Fig.6,6 Epsilonproteobacteria A Trp concatenate was previously available from only a single epsilonproteobacterium, this being the above-mentioned whole-operon intruder present in Helicobacter pylori. However, differences in gene organization and gene fusion noted for three other Epsilonproteobacteria (even though they lack status as complete genomes) indicated that this group has experienced dynamic evolutionary changes with respect to the Trp pathway. In contrast, the TyrA sequences from the 10 currently available genomes of Epsilonproteobacteria coexist as a cohesive grouping in TyrCG-13. The Epsilonproteobacteria exemplify a second case where dynamic evolutionary events in tryptophan biosynthesis have driven dramatic cohesion group divergence, in contrast to the modest tyrosine pathway divergence that accounts for a single TyrA cohesion group. Interestingly, TyrCG-13 also contains nine TyrA sequences that reside in the class Flavobacteria of the phylum Bacteroidetes. Given the occupation of TyrCG-13 by a mixture of nearly equal numbers of sequences from the class Epsilonproteobacteria and from the class Flavobacteria, a common ancestor of either group could a priori have been the recipient of a xenolog intruder originating from the other. It is concluded that the intruder sequences are the ones hosted by Flavobacteria based upon the rationale developed in the last section of this article and summarized by Fig. Fig.99 Deltaproteobacteria Trp concatenates from the only two Deltaproteobacteria previously available for study were divergent orphans. TyrA sequences from these same bacteria are also clearly divergent. One of them, from Geobacter sulfurreducens, occupies TyrCG-14 along with seven other sequences (TyrAα subhomology grouping). The other, from Desulfovibrio desulfuricans, occupies TyrCG-15 along with one other sequence (TyrAβ subhomology grouping). Three additional Deltaproteobacteria contain TyrA sequences that do not belong to the former two cohesion groups. The TyrA sequence from Synthrophobacter fumaroxidans is an orphan (TyrAβ subhomology grouping); TyrA from Anaeromyxobacter dehalogenans belongs to TyrCG-27 (TyrAα subhomology grouping), which is an unresolved phylogenetic mixture; and TyrA from Myxococcus xanthus is a xenolog intruder of TyrCG-30 (TyrAβ subhomology grouping). Firmicutes Tryptophan pathway concatenates found in the firmicute bacteria partitioned into two multisequence cohesion groups and two orphan sequences. TyrA sequences were distributed as four multisequence cohesion groups and two orphans. As discussed above, the membership of TrpSCG-6 and TrpSCG-7 was contributed by organisms whose TyrA proteins fell into a single TyrA cohesion group, TyrCG-18 (referred to as Firmicutes_1 in Table 2). Both the Trp pathway concatenate and TyrA from Desulfitobacterium hafniense were orphans. The orphan Trp pathway concatenate from Clostridium acetobutylicum corresponds to the three-member TyrCG-19 (referred to as Firmicutes_2 in Table 2). Interestingly, one of the TyrA sequences in TyrCG-19 is from C. difficile, an organism which lacks the tryptophan pathway altogether. TyrCG-20 (Firmicutes_3), TyrCG-21 (Firmicutes_4), and the orphan from Syntrophomonas wolfei contain TyrA sequences from organisms that were not available for the Trp pathway concatenate analysis. Cyanobacteria All of the Trp pathway concatenates fell into a single cohesion group, TrpSCG-4, and all TyrA sequences that function for primary tyrosine biosynthesis also fell into a single cohesion group, TyrCG-16. As discussed above, Nostoc species contain additional TyrA sequences that are intruder sequences belonging to TyrCG-1 and that have a specialized function in the synthesis of an indole alkaloid sunscreen agent (68). Actinomycetes Three tryptophan pathway concatenates from actinomycetes belonged to a single cohesion group, TrpSCG-5. Two additional actinomycete concatenates belonged to TrpSCG-1, but this was not due to divergence but was due to xenolog intrusion. TyrA sequences from 33 organisms all belong to TyrCG-17. Only TyrA from Symbiobacterium thermophilum is a divergent orphan. Emerging Perspective Supercohesion groups and cohesion groups have now been formulated for tryptophan pathway enzyme concatenates and for the TyrA assemblage of proteins, respectively. Once events of LGT have been sorted out, substantial parallelism is seen in the organisms that share members of a Trp cohesion group or a Tyr cohesion group. Dynamic evolutionary jumps that drive rapid divergence have been discussed for both the Trp pathway and the TyrA subsystem. Such evolutionary jumps have the effect of compressing the cohesion group to a smaller membership, as was seen with the multiple events which attended the insertion of a trp operon into an aro operon in Geobacillus species and a small clade of Bacillus species (80). It is noteworthy that the distinct separation of Gammaproteobacteria into lower Gammaproteobacteria and upper Gammaproteobacteria on the criterion of Trp cohesion groups is exactly paralleled on the criterion of Tyr cohesion groups. Other character states of enzymes performing early aromatic pathway reactions have been noted to exhibit qualitative differences that define the same taxonomic split. Kleeb et al. (44) observed two clusters on a phylogenetic tree of PheA sequences from Gammaproteobacteria (called by them Gammaproteobacteria I and Gammaproteobacteria II). The latter clusters correspond to lower Gammaproteobacteria and upper Gammaproteobacteria. It is noteworthy that Trp concatenates from upper Gammaproteobacteria and from Betaproteobacteria defined a single supercohesion group. The above-mentioned phylogenetic tree of PheA sequences, having less resolving power than Trp concatenates, nevertheless exhibited neighboring clusters, albeit with weak bootstrap support (44). A phylogenetic tree of TyrA sequences (Fig. (Fig.2)2 Identification in painstaking detail of qualitatively different character states of genes and their encoded products, their evolutionary progression in the vertical genealogy, and evolutionary acquisitions made via LGT can feasibly be accomplished for relatively small metabolic segments, such as the individual terminal branches of aromatic biosynthesis. Once coverage is completed for the entire pathway, including the minor vitamin-like branches, it should be apparent that evolutionary conclusions arrived at separately via steps that are essentially atomistic can be combined to describe evolutionary progressions at the whole-pathway level that reveal a larger gestalt of interlocking relationships. The next section illustrates examples of this approach. TRACKING MILESTONE EVOLUTIONARY EVENTS ACROSS SUBSYSTEMS Gene Fusion The tyrA gene exhibits a very similar environment of neighboring genes in the upper Gammaproteobacteria and Betaproteobacteria (67). (In contrast, the lower Gammaproteobacteria have a much different gene synteny.) The proposed ancestral synteny is one in which tyrA is closely followed by aroF, and this gene order has been largely conserved in the upper Gammaproteobacteria and the Betaproteobacteria that reside in the TyrAa subhomology grouping. Given the tenacity of these gene proximities, it is not surprising that tyrA-aroF fusions exist in both the upper Gammaproteobacteria and the Betaproteobacteria (Fig. (Fig.6,6 Did the tyrA-aroF fusion occur on a single occasion in the upper Gammaproteobacteria? An inspection of Fig. Fig.22 Since the cohesion groups are defined such that there is little confidence in the order of branching, a tree that was based upon an alignment of all the TyrA-AroF fusion sequences with concatenated TyrA and AroF sequences from upper Gammaproteobacteria and Betaproteobacteria that lack the fusion was assembled (Fig. (Fig.7).7 Aromatic Biosynthesis in the Subclass Actinobacteridae Figure Figure88
Aromatic Biosynthesis in the Superphylum Bacteroidetes/Chlorobi The Bacteroidetes/Chlorobi group, as a superphylum taxon, represents an ancient phylogenetic section. This taxon is chosen to illustrate how the evolutionary history of TyrA can be tracked in concert with other milestone events of aromatic biosynthesis (Fig. (Fig.9).9 In this superphylum, all TyrA proteins are NAD+ specific and of unknown specificity for the cyclohexadienyl substrate (NADTyrAx). Both the TyrAα and TyrAβ subhomology groups are represented in this superphylum, and it is suggested (Fig. (Fig.9)9 The TyrCG-13 cohesion group is populated by TyrA sequences from not only the class Flavobacteria but also the class Epsilonproteobacteria. Hence, a fairly ancient event of LGT is implicated. As an isolated observation, it is difficult to know which of these two classes is likely to be the host of the intruder sequences and which is likely to be the donor. If these classes diverged within their phyla at different times, the most recently emerged class could not have been the LGT donor to a common ancestor of the more ancient class. The phylogenetic tree of organisms reported by Olsen et al. (57) shows the class Epsilonproteobacteria diverging from its sister classes of Proteobacteria at an earlier time than the class Flavobacteria diverged from sister classes of the phylum Bacteroidetes. It thus appears that the class Flavobacteria did not yet exist at the time of the common ancestor of Epsilonproteobacteria, and hence, no Flavobacteria could have been the donor. On the other hand, a member of Epsilonproteobacteria could have been an LGT donor of tyrA to a common ancestor of the Flavobacteria. If so, it appears that the resident tyrA gene was replaced by homologous recombination without disrupting the aro operon, of which tyrA is a member; i.e., the context of gene organization surrounding tyrA in the Flavobacteria fits into the larger context of the superphylum (Fig. (Fig.9).9 OVERVIEW PERSPECTIVE Small proteins that are not highly conserved represent a contemporary challenge for functional annotation. A difficult hurdle is a determination of what evolutionary distance is valid for making annotation transfers with respect to phylogenomic inference. The degree to which various functional alternatives will persist over evolutionary distance will vary for different protein families. The extent to which current annotations are correct depends upon generations of previous experimental work and is hugely assisted by a fraction of genes that are highly conserved and evolve in the face of many limitations and constraints due to their elegant and complex mechanisms. Within the aromatic pathway, an example would be 5-enolpyruvylshikimate-3-phosphate synthase, a highly specific enzyme that utilizes a complex catalytic mechanism. Such complexity facilitates reliable annotations. On the other hand, enzymes having the plasticity to catalyze broad-specificity reactions can be represented by entirely different homology groups or by distinctly different subhomology groups that can make functional predictions elusive. A multitude of proteins (exemplified by such enzymes as kinases, phosphatases, and dehydrogenases) that illustrate the many and varied challenges for correct calls of functional role exist. The TyrA protein family of dehydrogenases benefits from a treasure trove background of wide-ranging comparative enzymology. The current analysis, together with previous work, has been a labor-intensive effort. Comparable efforts are not easily fitted to goals of high-throughput annotations for thousands of sequences in many hundreds of organisms, hence the dilemma of rapid results achieved with a lesser quality of annotation accuracy than one would like. “Difficult” gene products require a labor-intensive effort as a useful step in order to generate and preserve the information needed to allow the rich array of bioinformatic tools available to succeed in increasing the quality of high-throughput annotation efforts. Acknowledgments We acknowledge partial support from contract HHSN266200400042C from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, and from grant G13 LM008297 from the National Library of Medicine. APPENDIX Determination of Cohesion Groups TyrA sequences were collected from the SEED and from other public databases. A file of trimmed core supradomain TyrA sequences was created by trimming away obvious fused domains or extensions. At the N termini, all sequences were uniformly trimmed to begin five residues ahead of the GxGxxG motif, i.e., to match the beginning of the Wierenga fingerprint (73). C termini were trimmed of unconserved residues using the endpoints of some of the shortest TyrA proteins that have been fully characterized for guidance. ClustalX was used to create a preliminary alignment. This alignment was imported into the BioEdit sequence alignment editor. Manual adjustments were made to obtain a high-quality alignment. This alignment was used as input into a phylogenetic tree program (Phylip software, unweighted-pair group method using average linkages) (27). Trees were visualized with the TREEVIEW application (62). In the initial tree of 347 trimmed sequences, nodes were collapsed at bootstrap values of 68%. An arbitrarily chosen member of the collapsed groups was selected as a representative sequence of that node position. The resulting 64 sequences were used to obtain a second Phylip tree, which yielded 60 sequences with the collapse of a few more nodes when a bootstrap value of 68% was applied as a cutoff. An additional repetition of this process resulted in a final tally of 58 cohesion groups. The ultimate collapsed tree (Fig. (Fig.2)2 Web Resources at the SEED TyrA subsystem home page. Resources relevant to the TyrA subsystem, individually described below, are linked to the home page at http://theseed.uchicago.edu/FIG/Html/tyrASubsystem.html. This includes the online interactive version of Fig. Fig.2,2 Navigating to and within the Protein Pages. The version of Fig. Fig.22 One innovation in the extended table is a “gene neighborhood” button within each cohesion group section, which delivers a comparison of gene organization flanking tyrA within the cohesion group. Sortable character state snapshots. The individual panels of Fig. Fig.66 Semiautomation of cohesion groups. An important accomplishment would be to lock in and build upon the manual effort represented by this project with continuing semiautomatic follow-up. The technology to support the creation, curation, and advanced development of subsystems at the SEED was described previously (60). Tools to preserve the trimmed sequence alignment, accurately add newly available sequences, and update the tree and cohesion group assemblages are being implemented. Web Resources at AroPath The nomenclature of genes and gene products follows the rules posted under “Nomenclature: Genes/Enzymes” on the AroPath home page (http://aropath.lanl.gov). Aromatic pathway diagrams with complete biochemical structures, a list of attenuator structures associated with tyrA operons, and a tool (phyloTreeBuilder) to build 16S rRNA trees from selected organisms can be accessed from the home page. A universal four-letter system for coding organisms to the species level with unambiguous acronyms has been developed (the first letter of the genus in capital letters followed by the first three letters of the species in lowercase type). When necessary to disambiguate a four-letter acronym, a number is attached. For example, Escherichia coli is designated Ecol, whereas Enterococcus columbae is designated Ecol-1. If the species has not been determined, the first four letters of the genus are used (all in caps). To find a given four-letter acronym associated with an organism, a list of organisms currently in the system can be browsed by clicking the link under “organisms” entitled “browse organism acronyms” at the AroPath home page. Each organism is hyperlinked to the NCBI taxonomy browser. Each species entry can be expanded to show all of the component strains and their corresponding absolute acronyms (see below). In addition, a tool to generate an acronym that is unique at the level of a specific strain, designated an absolute acronym, is provided. A given strain or list of strains can be uploaded to AroPath by clicking the link under “organisms” entitled “get absolute acronym.” This will enable the return of an absolute acronym that is a unique identifier at the strain level. Any strain for which an absolute acronym has not been previously requested will automatically be assigned a unique designation, which will be held permanently in the database. Finally, a useful tool is provided to amend personal sequence files to be used for obtaining multiple sequence alignments and phylogenetic trees such that key acronym information for both organism and protein are displayed in the sequence names. FASTA sequence files can be uploaded to AroPath by clicking the link under “organisms” entitled “convert sequence files,” and a converted output will be returned. For example, a sequence name returned that begins “>Ecol_J_F_AroA_b,” when used as input in a tree-building program, will appear in that form as an informative label. It will indicate that the sequence is from Escherichia coli (Ecol) strain CFT073 (_J), that the sequence is from a finished genome (_F) rather than an unfinished genome (_U), and that the sequence is one of multiple AroA paralogs (AroA_b). If a hypothetical organism possessed a single gene product, two paralogs, or three paralogs, the corresponding designations would be AroA; AroA_a and AroA_b; and AroA_a, AroA_b, and AroA_c. REFERENCES 1. Abou-Zeid, A., G. Euverink, G. I. Hessels, R. A. Jensen, and L. Dijkhuizen. 1995. Biosynthesis of l-phenylalanine and l-tyrosine in the actinomycete Amycolatopsis methanolica. Appl. Environ. Microbiol. 611298-1302. [PubMed] 2. Afriat, L., C. Roodveldt, G. Manco, and D. S. Tawfik. 2006. The latent promiscuity of newly identified microbial lactonases is linked to a recently diverged phosphotriesterase. Biochemistry 4513677-13686. [PubMed] 3. Aharoni, A., L. Gaidukov, O. Khersonsky, S. M. Gould, C. Roodveldt, and D. S. Tawfik. 2005. The ‘evolvability’ of promiscuous protein functions. Nat. Genet. 3773-76. [PubMed] 4. Ahmad, S., and R. A. Jensen. 1988. The phylogenetic origin of the bifunctional tyrosine-pathway protein in the enteric lineage of bacteria. Mol. Biol. Evol. 5282-297. [PubMed] 5. Ahmad, S., and R. A. Jensen. 1987. The prephenate dehydrogenase component of the bifunctional T-protein in enteric bacteria can utilize L-arogenate. FEBS Lett. 216133-139. [PubMed] 6. Barona-Gómez, F., and D. A. Hodgson. 2003. Occurence of a putative ancient-like isomerase involved in histidine and tryptophan biosynthesis. EMBO Rep. 4296-300. [PubMed] 7. Blanc, V., P. Gil, N. Bamas-Jacques, S. Lorenzon, M. Zagorec, J. Schleuniger, D. Bisch, F. Blanche, L. Debussche, J. Crouzet, and D. Thibaut. 1997. Identification and analysis of genes from Streptomyces pristinaespiralis encoding enzymes involved in the biosynthesis of the 4-dimethylamino-L-phenylalanine precursor of pristinamycin I. Mol. Microbiol. 23191-202. [PubMed] 8. Bonner, C. A., R. S. Fischer, S. Ahmad, and R. A. Jensen. 1990. Remnants of an ancient pathway to l-phenylalanine and l-tyrosine in enteric bacteria: evolutionary implications and biotechnological impact. Appl. Environ. Microbiol. 563741-3747. [PubMed] 9. Bonner, C. A., R. S. Fischer, R. R. Schmidt, P. W. Miller, and R. A. Jensen. 1995. Distinctive enzymes of aromatic amino acid biosynthesis that are highly conserved in land plants are also present in the chlorophyte alga Chlorella sorokiniana. Plant Cell Physiol. 361013-1022. 10. Bonner, C. A., R. A. Jensen, J. E. Gander, and N. O. Keyhani. 2004. A core catalytic domain of the TyrA protein family: arogenate dehydrogenase from Synechocystis. Biochem. J. 382279-291. [PubMed] 11. Bonvin, J., R. A. Aponte, M. Marcantonio, S. Singh, D. Christendat, and J. L. Turnbull. 2006. Biochemical characterization of prephenate dehydrogenase from the hyperthermophilic bacterium Aquifex aeolicus. Protein Sci. 151417-1432. [PubMed] 12. Brown, K. A., E. P. Carpenter, K. A. Watson, J. R. Coggins, A. R. Hawkins, M. H. Koch, and D. I. Svergun. 2003. Twists and turns: a tale of two shikimate-pathway enzymes. Biochem. Soc. Trans. 31543-547. [PubMed] 13. Byng, G. S., R. J. Whitaker, R. L. Gherna, and R. A. Jensen. 1980. Variable enzymological patterning in tyrosine biosynthesis as a means of determining natural relatedness among the Pseudomonadaceae. J. Bacteriol. 144247-257. [PubMed] 14. Byng, G. S., R. J. Whitaker, C. L. Shapiro, and R. A. Jensen. 1981. The aromatic amino acid pathway branches at l-arogenate in Euglena gracilis. Mol. Cell. Biol. 1426-438. [PubMed] 15. Calhoun, D. H., D. L. Pierson, and R. A. Jensen. 1973. Channel-shuttle mechanism for the regulation of phenylalanine and tyrosine synthesis at a metabolic branch point in Pseudomonas aeruginosa. J. Bacteriol. 113241-251. [PubMed] 16. Catrina, I., P. J. O'Brien, J. Purcell, I. Nikolic-Hughes, J. G. Zalatan, A. C. Hengge, and D. Herschlag. 2007. Probing the origin of the compromised catalysis of E. coli alkaline phosphatase in its promiscuous sulfatase reaction. J. Am. Chem. Soc. 1295760-5765. [PubMed] 17. Champney, W. S., and R. A. Jensen. 1970. The enzymology of prephenate dehydrogenase in Bacillus subtilis. J. Biol. Chem. 2453763-3770. [PubMed] 18. Chen, S., S. Vincent, D. B. Wilson, and B. Ganem. 2003. Mapping of chorismate mutase and prephenate dehydrogenase domains in the Escherichia coli T-protein. Eur. J. Biochem. 270757-763. [PubMed] 19. de Kraker, J. W., K. Luck, S. Textor, J. G. Tokuhisa, and J. Gershenzon. 2006. Two Arabidopsis genes (IPMS1 and IPMS2) encode isopropylmalate synthase, the branchpoint step in the biosynthesis of leucine. Plant Physiol. 143970-986. [PubMed] 20. Domenech, J., and J. Ferrer. 2006. A new D-2-hydroxyacid dehydrogenase with dual coenzyme-specificity from Haloferax mediterranei, sequence analysis and heterologous overexpression. Biochim. Biophys. Acta 17601667-1674. [PubMed] 21. Doong, R.-L., R. Ganson, and R. A. Jensen. 1993. Plastid-localized 3-deoxy-D-arabino-heptulosonate 7-phosphate synthase (DS-Mn): the early-pathway target of sequential feedback inhibition in higher plants. Plant Cell Environ. 16393-402. 22. Embley, T. M., and E. Stackebrandt. 1994. The molecular phylogeny and systematics of the actinomycetes. Annu. Rev. Microbiol. 48257-289. [PubMed] 23. Fani, R., M. Brilli, and P. Lio. 2005. The origin and evolution of operons: the piecewise building of the proteobacterial histidine operon. J. Mol. Evol. 60378-390. [PubMed] 24. Fazel, A. M., J. R. Bowen, and R. A. Jensen. 1980. Arogenate (pretyrosine) is an obligatory intermediate of L-tyrosine biosynthesis: confirmation in a microbial mutant. Proc. Natl. Acad. Sci. USA 771270-1273. [PubMed] 25. Fazel, A. M., and R. A. Jensen. 1979. Obligatory biosynthesis of L-tyrosine via the pretyrosine branchlet in coryneform bacteria. J. Bacteriol. 138805-815. [PubMed] 26. Fazel, A. M., and R. A. Jensen. 1980. Regulation of prephenate dehydratase in coryneform species of bacteria by L-phenylalanine and by remote effectors. Arch. Biochem. Biophys. 200165-176. [PubMed] 27. Felsenstein, J. 1989. PHYLIP—phylogeny inference package (version 3.2). Cladistics 5164-166. 28. Gutierrez-Preciado, A., R. A. Jensen, C. Yanofsky, and E. Merino. 2005. New insights into regulation of the tryptophan biosynthetic operon in gram-positive bacteria. Trends Genet. 21432-436. [PubMed] 29. Hall, G. C., M. B. Flick, R. L. Gherna, and R. A. Jensen. 1982. Biochemical diversity for biosynthesis of aromatic amino acids among the cyanobacteria. J. Bacteriol. 14965-78. [PubMed] 30. Heath, R. J., and C. O. Rock. 2000. A triclosan-resistant bacterial enzyme. Nature 406145-146. [PubMed] 31. Hirano, S. I., M. Morikawa, K. Takano, T. Imanaka, and S. Kanaya. 2007. Gentisate 1,2-dioxygenase from Xanthobacter polyaromaticivorans 127W. Biosci. Biotechnol. Biochem. 71192-199. [PubMed] 32. Ingram-Smith, C., and K. S. Smith. 2006. AMP-forming acetyl-CoA synthetases in Archaea show unexpected diversity in substrate utilization. Archaea 295-107. 33. Itoh, T., K. Takemoto, H. Mori, and T. Gojobori. 1999. Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. Mol. Biol. Evol. 16332-346. [PubMed] 34. Jensen, R. A. 1976. Enzyme recruitment in evolution of new function. Annu. Rev. Microbiol. 30409-425. [PubMed] 35. Jensen, R. A., and S. Ahmad. 1990. Nested gene fusions as markers of phylogenetic branchpoints in prokaryotes. Trends Ecol. Evol. 5219-224. 36. Jensen, R. A., and W. Gu. 1996. Evolutionary recruitment of biochemically specialized subdivisions of family I within the protein superfamily of aminotransferases. J. Bacteriol. 1782161-2171. [PubMed] 37. Jensen, R. A., and E. W. Nester. 1965. The regulatory significance of intermediary metabolites: control of aromatic acid biosynthesis by feedback inhibition in Bacillus subtilis. J. Mol. Biol. 12468-481. [PubMed] 38. Jensen, R. A., G. Xie, D. H. Calhoun, and C. A. Bonner. 2002. The correct phylogenetic relationship of KdsA (3-deoxy-d-manno-octulosonate 8-phosphate synthase) with one of two independently evolved classes of AroA (3-deoxy-d-arabino-heptulosonate 7-phosphate synthase). J. Mol. Evol. 54416-423. [PubMed] 39. Keller, B., E. Keller, H. Gorisch, and F. Lingens. 1983. Biosynthesis of phenylalanine and tyrosine in streptomycetes. Hoppe Seyler's Z. Physiol. Chem. 364455-459. (In German.) [PubMed] 40. Keller, B., E. Keller, and F. Lingens. 1985. Arogenate dehydrogenase from Streptomyces phaeochromogenes. Purification and properties. Biol. Chem. Hoppe-Seyler 3661063-1066. [PubMed] 41. Khersonsky, O., C. Roodveldt, and D. S. Tawfik. 2006. Enzyme promiscuity: evolutionary and mechanistic aspects. Curr. Opin. Chem. Biol. 10498-508. [PubMed] 42. Kino, K., S. Kuratsu, A. Noguchi, M. Kokubo, Y. Nakazawa, T. Arai, M. Yagasaki, and K. Kirimura. 2007. Novel substrate specificity of glutathione synthesis enzymes from Streptococcus agalactiae and Clostridium acetobutylicum. Biochem. Biophys. Res. Commun. 352351-359. [PubMed] 43. Kino, K., M. Sato, M. Yoneyama, and K. Kirimura. 2007. Synthesis of DL-tryptophan by modified broad specificity amino acid racemase from Pseudomonas putida IFO 12996. Appl. Microbiol. Biotechnol. 731299-1305. [PubMed] 44. Kleeb, A. C., P. Kast, and D. Hilvert. 2006. A monofunctional and thermostable prephenate dehydratase from the archaeon Methanocaldococcus jannaschii. Biochemistry 4514101-14110. [PubMed] 45. Kunzler, D. E., S. Sasso, M. Gamper, D. Hilvert, and P. Kast. 2005. Mechanistic insights into the isochorismate pyruvate lyase activity of the catalytically promiscuous PchB from combinatorial mutagenesis and selection. J. Biol. Chem. 28032827-32834. [PubMed] 46. Kurakin, A. 2007. Self-organization versus Watchmaker: ambiguity of molecular recognition and design charts of cellular circuitry. J. Mol. Recog. 20205-214. 47. Lawrence, J. G. 1999. Gene transfer, speciation, and the evolution of genomes. Curr. Opin. Microbiol. 2519-523. [PubMed] 48. Legrand, P., R. Dumas, M. Seux, P. Rippert, R. Ravelli, J. L. Ferrer, and M. Matringe. 2006. Biochemical characterization and crystal structure of Synechocystis arogenate dehydrogenase provide insights into catalytic reaction. Structure 14767-776. [PubMed] 49. Liberles, J. S., M. Thorolfsson, and A. Martinez. 2005. Allosteric mechanisms in ACT domain containing enzymes involved in amino acid metabolism. Amino Acids 281-12. [PubMed] 50. Macchiarulo, A., I. Nobeli, and J. M. Thornton. 2004. Ligand selectivity and competition between enzymes in silico. Nat. Biotechnol. 221039-1045. [PubMed] 51. Mayer, E., S. Waldner-Sander, B. Keller, E. Keller, and F. Lingens. 1985. Purification of arogenate dehydrogenase from Phenylobacterium immobile. FEBS Lett. 179208-212. [PubMed] 52. Miller, B. G., and R. T. Raines. 2004. Identifying latent enzyme activities: substrate ambiguity within modern bacterial sugar kinases. Biochemistry 436387-6392. [PubMed] 53. Nazina, T. N., T. P. Tourova, A. B. Poltaraus, E. V. Novikova, A. A. Grigoryan, A. E. Ivanova, A. M. Lysenko, V. V. Petrunyaka, G. A. Osipov, S. S. Belyaev, and M. V. Ivanov. 2001. Taxonomic study of aerobic thermophilic bacilli: descriptions of Geobacillus subterraneus gen. nov., sp. nov. and Geobacillus uzenensis sp. nov. from petroleum reservoirs and transfer of Bacillus stearothermophilus, Bacillus thermocatenulatus, Bacillus thermoleovorans, Bacillus kaustophilus and Bacillus thermodenitrificans to Geobacillus as the new combinations G. stearothermophilus, G. thermoleovorans, G. kaustophilus, G. thermoglucosidasius and G. thermodenitrificans. Int. J. Syst. Evol. Microbiol. 51433-446. [PubMed] 54. Nishimasu, H., S. Fushinobu, H. Shoun, and T. Wakagi. 2007. Crystal structures of an ATP-dependent hexokinase with broad substrate specificity from the hyperthermophilic archaeon Sulfolobus tokodaii. J. Biol. Chem. 2829923-9931. [PubMed] 55. O'Brien, P. J., and D. Herschlag. 1999. Catalytic promiscuity and the evolution of new enzymatic activities. Chem. Biol. 6R91-R105. [PubMed] 56. Okvist, M., R. Dey, S. Sasso, E. Grahn, P. Kast, and U. Krengel. 2006. 1.6 A crystal structure of the secreted chorismate mutase from Mycobacterium tuberculosis: novel fold topology revealed. J. Mol. Biol. 3571483-1499. [PubMed] 57. Olsen, G. J., C. R. Woese, and R. Overbeek. 1994. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol. 1761-6. [PubMed] 58. Osterman, A. 2006. A hidden metabolic pathway exposed. Proc. Natl. Acad. Sci. USA 1035637-5638. [PubMed] 59. Osterman, A., and R. Overbeek. 2003. Missing genes in metabolic pathways: a comparative genomics approach. Curr. Opin. Chem. Biol. 7238-251. [PubMed] 60. Overbeek, R., T. Begley, R. M. Butler, J. V. Choudhuri, H. Y. Chuang, M. Cohoon, V. de Crecy-Lagard, N. Diaz, T. Disz, R. Edwards, M. Fonstein, E. D. Frank, S. Gerdes, E. M. Glass, A. Goesmann, A. Hanson, D. Iwata-Reuyl, R. Jensen, N. Jamshidi, L. Krause, M. Kubal, N. Larsen, B. Linke, A. C. McHardy, F. Meyer, H. Neuweger, G. Olsen, R. Olson, A. Osterman, V. Portnoy, G. D. Pusch, D. A. Rodionov, C. Ruckert, J. Steiner, R. Stevens, I. Thiele, O. Vassieva, Y. Ye, O. Zagnitko, and V. Vonstein. 2005. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 335691-5702. [PubMed] 61. Overbeek, R., M. Fonstein, M. D'Souza, G. D. Pusch, and N. Maltsev. 1999. The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 962896-2901. [PubMed] 62. Page, R. D. M. 1996. TREEVIEW: an application to display phylogenetic trees on personal computers. Comp. Appl. Biosci. 12357-358. [PubMed] 63. Porat, I., B. W. Waters, Q. Teng, and W. B. Whitman. 2004. Two biosynthetic pathways for aromatic amino acids in the archaeon Methanococcus maripaludis. J. Bacteriol. 1864940-4950. [PubMed] 64. Rippert, P., and M. Matringe. 2002. Molecular and biochemical characterization of an Arabidopsis thaliana arogenate dehydrogenase with two highly similar and active protein domains. Plant Mol. Biol. 48361-368. [PubMed] 65. Ryding, N. J., T. B. Anderson, and W. C. Champness. 2002. Regulation of the Streptomyces coelicolor calcium-dependent antibiotic by absA, encoding a cluster-linked two-component system. J. Bacteriol. 184794-805. [PubMed] 66. Schwab, W. 2003. Metabolome diversity: too few genes, too many metabolites? Phytochemistry 62837-849. [PubMed] 67. Song, J., C. A. Bonner, M. Wolinsky, and R. A. Jensen. 2005. The TyrA family of aromatic-pathway dehydrogenases in phylogenetic context. BMC Biol. 313. [PubMed] 68. Soule, T., V. Stout, W. D. Swingley, J. C. Meeks, and F. Garcia-Pichel. 2007. Molecular genetics and genomic analysis of scytonemin biosynthesis in Nostoc punctiforme ATCC 29133. J. Bacteriol. 1894465-4472. [PubMed] 69. Stenmark, S. L., D. L. Pierson, F. I. Glover, and R. A. Jensen. 1974. Blue-green bacteria synthesize L-tyrosine by the pretyrosine pathway. Nature 247290-292. [PubMed] 70. Subramaniam, P., R. Bhatnagar, A. Hooper, and R. A. Jensen. 1994. The dynamic progression of evolved character states for aromatic amino acid biosynthesis in gram-negative bacteria. Microbiology 1403431-3440. [PubMed] 71. Sun, W., S. Singh, R. Zhang, J. L. Turnbull, and D. Christendat. 2006. Crystal structure of prephenate dehydrogenase from Aquifex aeolicus. Insights into the catalytic mechanism. J. Biol. Chem. 28112919-12928. [PubMed] 72. Vogel, C., M. Bashton, N. D. Kerrison, C. Chothia, and S. A. Teichmann. 2004. Structure, function and evolution of multidomain proteins. Curr. Opin. Struct. Biol. 14208-216. [PubMed] 73. Wierenga, R. K., P. Terpstra, and W. G. Hol. 1986. Prediction of the occurrence of the ADP-binding beta alpha beta-fold in proteins, using an amino acid sequence fingerprint. J. Mol. Biol. 187101-107. [PubMed] 74. Wolterink-van Loo, S., A. van Eerde, M. A. Siemerink, J. Akerboom, B. W. Dijkstra, and J. van der Oost. 2006. Biochemical and structural exploration of the catalytic capacity of Sulfolobus KDG aldolases. Biochem. J. 403421-430. 75. Xia, T., G. Zhao, R. S. Fischer, and R. A. Jensen. 1992. A monofunctional prephenate dehydrogenase created by cleavage of the 5′ 109 bp of the tyrA gene from Erwinia herbicola. J. Gen. Microbiol. 1381309-1316. [PubMed] 76. Xie, G., C. A. Bonner, T. Brettin, R. Gottardo, N. O. Keyhani, and R. A. Jensen. 2003. Lateral gene transfer and ancient paralogy of operons containing redundant copies of tryptophan-pathway genes in Xylella species and in heterocystous cyanobacteria. Genome Biol. 4R14. [PubMed] 77. Xie, G., C. A. Bonner, and R. A. Jensen. 2000. Cyclohexadienyl dehydrogenase from Pseudomonas stutzeri exemplifies a widespread type of tyrosine-pathway dehydrogenase in the TyrA protein family. Comp. Biochem. Physiol. C Toxicol. Pharmacol. 12565-83. [PubMed] 78. Xie, G., C. A. Bonner, J. Song, N. O. Keyhani, and R. A. Jensen. 2004. Inter-genomic displacement via lateral gene transfer of bacterial trp operons in an overall context of vertical genealogy. BMC Biol. 215. [PubMed] 79. Xie, G., T. S. Brettin, C. A. Bonner, and R. A. Jensen. 1999. Mixed-function supraoperons that exhibit overall conservation, albeit shuffled gene organization, across wide intergenomic distances within eubacteria. Microb. Comp. Genomics 45-28. [PubMed] 80. Xie, G., N. O. Keyhani, C. A. Bonner, and R. A. Jensen. 2003. Ancient origin of the tryptophan operon and the dynamics of evolutionary change. Microbiol. Mol. Biol. Rev. 67303-342. [PubMed] 81. Yanai, K., N. Sumida, K. Okakura, T. Moriya, M. Watanabe, and T. Murakami. 2004. para-Position derivatives of fungal anthelmintic cyclodepsipeptides engineered with Streptomyces venezuelae antibiotic biosynthetic genes. Nat. Biotechnol. 22848-855. [PubMed] 82. Zamir, L. O., R. A. Jensen, B. Arison, A. Douglas, G. Albers-Schonberg, and J. R. Bowen. 1980. Structure of arogenate (pretyrosine), an amino acid intermediate of aromatic biosynthesis. J. Am. Chem. Soc. 1024499-4504. 83. Zamir, L. O., R. Tiberio, K. A. Devor, F. Sauriol, S. Ahmad, and R. A. Jensen. 1988. Structure of D-prephenyllactate. A carboxycyclohexadienyl metabolite from Neurospora crassa. J. Biol. Chem. 26317284-17290. [PubMed] 84. Zeigler, D. R. 2005. Application of a recN sequence similarity analysis to the identification of species within the bacterial genus Geobacillus. Int. J. Syst. Evol. Microbiol. 551171-1179. [PubMed] 85. Zhang, L., B. Ahvazi, R. Szittner, A. Vrielink, and E. Meighen. 1999. Change of nucleotide specificity and enhancement of catalytic efficiency in single point mutants of Vibrio harveyi aldehyde dehydrogenase. Biochemistry 3811440-11447. [PubMed] 86. Zhao, G., T. Xia, L. O. Ingram, and R. A. Jensen. 1993. An allosterically insensitive class of cyclohexadienyl dehydrogenase from Zymomonas mobilis. Eur. J. Biochem. 212157-165. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||
BMC Biol. 2004 Jun 23; 2():15.
[BMC Biol. 2004]Mol Microbiol. 1997 Jan; 23(2):191-202.
[Mol Microbiol. 1997]Nat Biotechnol. 2004 Jul; 22(7):848-55.
[Nat Biotechnol. 2004]J Biol Chem. 1988 Nov 25; 263(33):17284-90.
[J Biol Chem. 1988]Appl Environ Microbiol. 1990 Dec; 56(12):3741-7.
[Appl Environ Microbiol. 1990]J Biol Chem. 1970 Aug 10; 245(15):3763-70.
[J Biol Chem. 1970]J Bacteriol. 1980 Oct; 144(1):247-57.
[J Bacteriol. 1980]J Mol Evol. 2002 Mar; 54(3):416-23.
[J Mol Evol. 2002]FEBS Lett. 1985 Jan 7; 179(2):208-12.
[FEBS Lett. 1985]Mol Biol Evol. 1988 May; 5(3):282-97.
[Mol Biol Evol. 1988]FEBS Lett. 1987 May 25; 216(1):133-9.
[FEBS Lett. 1987]J Bacteriol. 1980 Oct; 144(1):247-57.
[J Bacteriol. 1980]Eur J Biochem. 1993 Feb 15; 212(1):157-65.
[Eur J Biochem. 1993]J Gen Microbiol. 1992 Jul; 138(7):1309-16.
[J Gen Microbiol. 1992]Comp Biochem Physiol C Toxicol Pharmacol. 2000 Jan; 125(1):65-83.
[Comp Biochem Physiol C Toxicol Pharmacol. 2000]Biochem J. 2004 Aug 15; 382(Pt 1):279-91.
[Biochem J. 2004]Plant Mol Biol. 2002 Mar; 48(4):361-8.
[Plant Mol Biol. 2002]Mol Cell Biol. 1981 May; 1(5):426-38.
[Mol Cell Biol. 1981]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]Hoppe Seylers Z Physiol Chem. 1983 Apr; 364(4):455-9.
[Hoppe Seylers Z Physiol Chem. 1983]Amino Acids. 2005 Feb; 28(1):1-12.
[Amino Acids. 2005]Nature. 1974 Feb 1; 247(439):290-2.
[Nature. 1974]Proc Natl Acad Sci U S A. 1980 Mar; 77(3):1270-3.
[Proc Natl Acad Sci U S A. 1980]Arch Biochem Biophys. 1980 Mar; 200(1):165-76.
[Arch Biochem Biophys. 1980]Appl Environ Microbiol. 1995 Apr; 61(4):1298-1302.
[Appl Environ Microbiol. 1995]Hoppe Seylers Z Physiol Chem. 1983 Apr; 364(4):455-9.
[Hoppe Seylers Z Physiol Chem. 1983]Biol Chem Hoppe Seyler. 1985 Nov; 366(11):1063-6.
[Biol Chem Hoppe Seyler. 1985]Proc Natl Acad Sci U S A. 2006 Apr 11; 103(15):5637-8.
[Proc Natl Acad Sci U S A. 2006]J Bacteriol. 2004 Aug; 186(15):4940-50.
[J Bacteriol. 2004]J Bacteriol. 1996 Apr; 178(8):2161-71.
[J Bacteriol. 1996]Nat Genet. 2005 Jan; 37(1):73-6.
[Nat Genet. 2005]Biochemistry. 2004 Jun 1; 43(21):6387-92.
[Biochemistry. 2004]Annu Rev Microbiol. 1976; 30():409-25.
[Annu Rev Microbiol. 1976]Phytochemistry. 2003 Mar; 62(6):837-49.
[Phytochemistry. 2003]Nat Biotechnol. 2004 Aug; 22(8):1039-45.
[Nat Biotechnol. 2004]Annu Rev Microbiol. 1976; 30():409-25.
[Annu Rev Microbiol. 1976]Chem Biol. 1999 Apr; 6(4):R91-R105.
[Chem Biol. 1999]J Am Chem Soc. 2007 May 2; 129(17):5760-5.
[J Am Chem Soc. 2007]Biochemistry. 2006 Nov 21; 45(46):13677-86.
[Biochemistry. 2006]Biosci Biotechnol Biochem. 2007 Jan; 71(1):192-9.
[Biosci Biotechnol Biochem. 2007]J Biol Chem. 2007 Mar 30; 282(13):9923-31.
[J Biol Chem. 2007]Plant Physiol. 2007 Feb; 143(2):970-86.
[Plant Physiol. 2007]Curr Opin Struct Biol. 2004 Apr; 14(2):208-16.
[Curr Opin Struct Biol. 2004]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]Biochem J. 2004 Aug 15; 382(Pt 1):279-91.
[Biochem J. 2004]Comp Biochem Physiol C Toxicol Pharmacol. 2000 Jan; 125(1):65-83.
[Comp Biochem Physiol C Toxicol Pharmacol. 2000]Microbiology. 1994 Dec; 140 ( Pt 12)():3431-40.
[Microbiology. 1994]Eur J Biochem. 1993 Feb 15; 212(1):157-65.
[Eur J Biochem. 1993]Int J Syst Evol Microbiol. 2005 May; 55(Pt 3):1171-9.
[Int J Syst Evol Microbiol. 2005]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]BMC Biol. 2004 Jun 23; 2():15.
[BMC Biol. 2004]BMC Biol. 2004 Jun 23; 2():15.
[BMC Biol. 2004]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]J Bacteriol. 2007 Jun; 189(12):4465-72.
[J Bacteriol. 2007]Curr Opin Microbiol. 1999 Oct; 2(5):519-23.
[Curr Opin Microbiol. 1999]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]Int J Syst Evol Microbiol. 2001 Mar; 51(Pt 2):433-46.
[Int J Syst Evol Microbiol. 2001]Trends Genet. 2005 Aug; 21(8):432-6.
[Trends Genet. 2005]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]Biochem J. 2004 Aug 15; 382(Pt 1):279-91.
[Biochem J. 2004]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]Biochem J. 2004 Aug 15; 382(Pt 1):279-91.
[Biochem J. 2004]J Mol Biol. 1986 Jan 5; 187(1):101-7.
[J Mol Biol. 1986]Structure. 2006 Apr; 14(4):767-76.
[Structure. 2006]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]Structure. 2006 Apr; 14(4):767-76.
[Structure. 2006]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]Biochemistry. 1999 Aug 31; 38(35):11440-7.
[Biochemistry. 1999]J Mol Biol. 1986 Jan 5; 187(1):101-7.
[J Mol Biol. 1986]Microbiology. 1994 Dec; 140 ( Pt 12)():3431-40.
[Microbiology. 1994]Proc Natl Acad Sci U S A. 1980 Mar; 77(3):1270-3.
[Proc Natl Acad Sci U S A. 1980]J Bacteriol. 1979 Jun; 138(3):805-15.
[J Bacteriol. 1979]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]J Mol Biol. 1986 Jan 5; 187(1):101-7.
[J Mol Biol. 1986]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]J Mol Evol. 2005 Mar; 60(3):378-90.
[J Mol Evol. 2005]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]J Mol Biol. 2006 Apr 14; 357(5):1483-99.
[J Mol Biol. 2006]Mol Biol Evol. 1999 Mar; 16(3):332-46.
[Mol Biol Evol. 1999]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]Nature. 2000 Jul 13; 406(6792):145-6.
[Nature. 2000]Curr Opin Chem Biol. 2003 Apr; 7(2):238-51.
[Curr Opin Chem Biol. 2003]Proc Natl Acad Sci U S A. 1999 Mar 16; 96(6):2896-901.
[Proc Natl Acad Sci U S A. 1999]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]J Bacteriol. 2002 Feb; 184(3):794-805.
[J Bacteriol. 2002]BMC Biol. 2004 Jun 23; 2():15.
[BMC Biol. 2004]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]Eur J Biochem. 2003 Feb; 270(4):757-63.
[Eur J Biochem. 2003]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]Comp Biochem Physiol C Toxicol Pharmacol. 2000 Jan; 125(1):65-83.
[Comp Biochem Physiol C Toxicol Pharmacol. 2000]BMC Biol. 2004 Jun 23; 2():15.
[BMC Biol. 2004]Genome Biol. 2003; 4(2):R14.
[Genome Biol. 2003]Eur J Biochem. 2003 Feb; 270(4):757-63.
[Eur J Biochem. 2003]J Bacteriol. 2002 Feb; 184(3):794-805.
[J Bacteriol. 2002]Biochem Soc Trans. 2003 Jun; 31(Pt 3):543-7.
[Biochem Soc Trans. 2003]J Bacteriol. 1982 Jan; 149(1):65-78.
[J Bacteriol. 1982]Protein Sci. 2006 Jun; 15(6):1417-32.
[Protein Sci. 2006]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]Comp Biochem Physiol C Toxicol Pharmacol. 2000 Jan; 125(1):65-83.
[Comp Biochem Physiol C Toxicol Pharmacol. 2000]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]Protein Sci. 2006 Jun; 15(6):1417-32.
[Protein Sci. 2006]Structure. 2006 Apr; 14(4):767-76.
[Structure. 2006]Biochem J. 2004 Aug 15; 382(Pt 1):279-91.
[Biochem J. 2004]J Bacteriol. 1982 Jan; 149(1):65-78.
[J Bacteriol. 1982]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]Biochem J. 2004 Aug 15; 382(Pt 1):279-91.
[Biochem J. 2004]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]Structure. 2006 Apr; 14(4):767-76.
[Structure. 2006]Comp Biochem Physiol C Toxicol Pharmacol. 2000 Jan; 125(1):65-83.
[Comp Biochem Physiol C Toxicol Pharmacol. 2000]J Biol Chem. 1970 Aug 10; 245(15):3763-70.
[J Biol Chem. 1970]Eur J Biochem. 2003 Feb; 270(4):757-63.
[Eur J Biochem. 2003]Protein Sci. 2006 Jun; 15(6):1417-32.
[Protein Sci. 2006]BMC Biol. 2004 Jun 23; 2():15.
[BMC Biol. 2004]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]BMC Biol. 2004 Jun 23; 2():15.
[BMC Biol. 2004]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]J Bacteriol. 2007 Jun; 189(12):4465-72.
[J Bacteriol. 2007]Biochemistry. 2006 Nov 28; 45(47):14101-10.
[Biochemistry. 2006]J Bacteriol. 2007 Jun; 189(12):4465-72.
[J Bacteriol. 2007]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]Biochemistry. 2006 Nov 28; 45(47):14101-10.
[Biochemistry. 2006]BMC Biol. 2005 May 12; 3():13.
[BMC Biol. 2005]EMBO Rep. 2003 Mar; 4(3):296-300.
[EMBO Rep. 2003]Annu Rev Microbiol. 1994; 48():257-89.
[Annu Rev Microbiol. 1994]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]Microb Comp Genomics. 1999; 4(1):5-28.
[Microb Comp Genomics. 1999]J Mol Biol. 1965 Jun; 12():468-81.
[J Mol Biol. 1965]Microbiol Mol Biol Rev. 2003 Sep; 67(3):303-42, table of contents.
[Microbiol Mol Biol Rev. 2003]J Bacteriol. 1994 Jan; 176(1):1-6.
[J Bacteriol. 1994]J Mol Biol. 1986 Jan 5; 187(1):101-7.
[J Mol Biol. 1986]Comput Appl Biosci. 1996 Aug; 12(4):357-8.
[Comput Appl Biosci. 1996]Nucleic Acids Res. 2005; 33(17):5691-702.
[Nucleic Acids Res. 2005]Comput Appl Biosci. 1996 Aug; 12(4):357-8.
[Comput Appl Biosci. 1996]Structure. 2006 Apr; 14(4):767-76.
[Structure. 2006]J Biol Chem. 2006 May 5; 281(18):12919-28.
[J Biol Chem. 2006]J Mol Biol. 1986 Jan 5; 187(1):101-7.
[J Mol Biol. 1986]