• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of prosciprotein sciencecshl presssubscriptionsetoc alertsthe protein societyjournal home
Protein Sci. Jul 2003; 12(7): 1418–1431.
PMCID: PMC2323934

Three monophyletic superfamilies account for the majority of the known glycosyltransferases

Abstract

Sixty-five families of glycosyltransferases (EC 2.4.x.y) have been recognized on the basis of high-sequence similarity to a founding member with experimentally demonstrated enzymatic activity. Although distant sequence relationships between some of these families have been reported, the natural history of glycosyltransferases is poorly understood. We used iterative searches of sequence databases, motif extraction, structural comparison, and analysis of completely sequenced genomes to track the origins of modern-type glycosyltransferases. We show that >75% of recognized glycosyltransferase families belong to one of only three monophyletic superfamilies of proteins, namely, (1) a recently described GPGTF/GT-B superfamily; (2) a nucleoside-diphosphosugar transferase (GT-A) superfamily, which is characterized by a DxD sequence signature and also includes nucleotidyltransferases; and (3) a GT-C superfamily of integral membrane glycosyltransferases with a modified DxD signature in the first extracellular loop. Several developmental regulators in Metazoans, including Fringe and Egghead homologs, belong to the second superfamily. Interestingly, Tout-velu/Exostosin family of developmental proteins found in all multicellular eukaryotes, contains separate domains belonging to the first and the second superfamilies, explaining multiple glycosyltransferase activities in one protein.

Keywords: Glycosyltransferases, exostosin, Fringe, Egghead, protein sequence evolution

Glycosyltransferases (GTs; EC 2.4.x.y) transfer sugars to various targets, forming glycoside bonds. The acceptor molecules are specific for a given GT enzyme, but as a whole, display enormous diversity and include all types of macromolecules and numerous classes of low molecular-weight compounds. The common activated donors of sugars include nucleotide diphosphosugars, nucleotide monophosphosugars (in which cases, the formation of the activated donor itself involves an act of glycosyl transfer), and sugar phosphates.

On the basis of analogies with better-studied glycosidases, two main catalytic mechanisms for glycosyl transfer reaction have been proposed (Sinnott 1991). In the inverting mechanism, the acceptor is thought to perform a nucleophilic attack at C1 of the nucleotide diphosphosugar donor, and the anomeric configuration of the added sugar is changed (for instance, UDP-glucose→β-glucoside). In the retaining mechanism of glycosidases, the process is probably two-step, involving formation of a glycosyl-enzyme intermediate, release of the nucleoside diphosphate, and the subsequent attack of the glycosyl enzyme by the acceptor; the configuration of the transferred sugar is retained (for instance, UDP-glucose→α-glucoside). The existence of a glycosyl-enzyme intermediate of a retaining reaction, however, has never been demonstrated for any glycosyltransferase, and the validity of the mechanistic analogy with glycosidase reaction remains to be established (Withers et al. 2002).

In both types of glycosyl hydrolysis, the residues with acidic or polar side chains, often aspartates, are known to play the roles of general base and nucleophile (Sinnott 1991; McCarter and Withers 1994). Mechanistic evidence and the information obtained from the analysis of three-dimensional structures suggests an important functional role for the carboxylate residues in GT active centers, although, in ways that may be different from what is known about glycosyl hydrolase reaction. The so-called DxD motif, which is found in many groups of both inverting and retaining glycosyltransferases (Breton et al. 1998; Wiggins and Munro 1998; Breton and Imberty 1999; Unligil and Rini 2000) is thought to be involved in binding of a divalent cation, most commonly Mn2+ or Mg2+, and in catalysis. For example, in four representative structures of inverting glycosyltransferases (Protein Data Bank [PDB; http://www.rcsb.org] structures 1QGQ, 1J8X, 1FOA, and 1FGG), the last aspartate residue in DxD motif binds the divalent metal ion (Tarbouriech et al. 2001). In the retaining galactosyltransferase LgtC (PDB structure 1G9R) complexed with Mn2+ and Udp-2F-Galactose, a single Mn2+ is coordinated by the two phosphate oxygens of UDP as well as the side-chain atoms of His 244, Asp103, and Asp105 (the two Asp residues from the DXD motif). The Asp 103 provides one side-chain oxygen, and Asp105 provides both side-chain oxygen atoms in a bidentate interaction (Persson et al. 2001). Although it is conceivable that the bound divalent cation acts in the catalysis by polarizing a water molecule, which then may participate in the attack at C1, the exact identity of either nucleophile or general base have not been demonstrated directly in these cases. In another set of structures, which belong to the GPGTF (Wrabl and Grishin 2001) superfamily (also known as GT-B, e.g., Bourne and Henrissat 2001), there is no evidence of a bound metal ion associated with catalysis, but there are several partially conserved acidic residues that are involved in interactions with the substrate, and, in some enzymes, the catalytic role has been proposed for two glutamic acid residues in the carboxy-terminal E-X7-E motif (Cid et al. 2000).

Glycosyltransferases catalyze what has been called the most important transfer reaction on earth, considering the biomass involved in turnover of such polysaccharides as chitin, cellulose, starch, glycogen, and microbial cell wall components (Law and Reid 1995). Protein glycosylation, moreover, mediates crucial regulatory events in metazoan development. For example, boundary formation in developing fruit-fly embryo requires Fringe-dependent elongation of O-linked fucose on specific EGF-like repeats in the extracellular domains of Notch receptors, and that modification modulates the activation of Notch, in turn regulating the activation of Notch target genes (Bruckner et al. 2000; Munro and Freeman 2000). Mammalian tumor suppressor exostosin has a glycosyltransferase, namely a heparan sulfate copolymerase, activity, and its fruit-fly homolog, Tout-velu, is required for regulation of movement of patterning factor Hedgehog (Bellaiche et al. 1998). Mutations in glycosyltransferases result in various human diseases, such as Gilbert syndrome (Online Mendelian Inheritance in Man [OMIM] database entry 143500), Crigler-Najjar type I (OMIM 218800), and type II syndromes (OMIM 606785), in which the UDP-glucuronosyltransferase gene UGT1A1 is mutated, and muscle-eye-brain disease (OMIM 253280), which is caused by mutations in O-mannose β-1,2-N-acetylglucosaminyltransferase, POMGNT1. Our future insights into the developmental processes in normal and diseased states, as well as into cellular intermediate metabolism, will be greatly aided by deep understanding of the sequence-structure-function relationships of glycosyltransferases.

A database of enzymes involved in carbohydrate metabolism is maintained by the Glycobiology unit at AFMB-CNRS in Marseille, France (http://afmb.cnrs-mrs.fr/CAZY/index.html). The glycosyltransferase section of the database contains >7000 sequences, organized into 65 families on the basis of high-sequence similarity to one or more founding members with experimentally demonstrated GT activity. In addition, proteins with similar sequences, but different catalytic mechanisms, tend to be placed in separate families, as in the case of a subset of polypeptide GalNAc transferases that were removed from the GT-2 family after being recognized as retaining enzymes, and grouped into a new family, GT-27. A few other distant similarities between the CAZy families have been noted, suggesting that some families within CAZy share common ancestors (Campbell et al. 1997). A natural classification of GTs that would suggest the evolutionary events leading to the emergence of the present-day GT sequences is, however, still unavailable.

Probabilistic methods of database searching, such as PSI-BLAST (Altschul et al. 1997) and HMMer (Eddy 1998), use evolutionary models of a group of related sequences to detect homologs with lower sequence similarity. Statistical theory allows one to validate weak sequence matches detected in that way, by showing that they are unlikely to have arisen by chance alone (Karlin and Altschul 1990). In the case of glycosyltransferases, distant evolutionary relationships between, and a monophyletic origin of 15 families in the CAZy database have been demonstrated recently using PSI-BLAST searches (Wrabl and Grishin 2001). The newly established superfamily is an extension of what has been also called GT-B family. The extended superfamily includes >2700 proteins, which represent all three domains of life and almost every completely sequenced genome so far (with the exception of small genomes of Mollicutes). For several members of the GT-B superfamily, the three-dimensional structures of their catalytic domains have been determined and are virtually the same, consisting of a duplicated Rossmann-like αβα fold. In addition to the bona fide glycosyltransferases, the GT-B family includes other enzymes involved in sugar metabolism, such as sugar epimerases (Wrabl and Grishin 2001), adding to the growing list of examples in which the catalytic activity is thought to have changed during the evolution of sequence family (Mushegian and Koonin 1994; Copley and Bork 2000; Smit and Mushegian 2000; Nagano et al. 2002).

In this work, we undertook detailed sequence and structure comparison of the 65 families of glycosyltransferases contained in the CAZy database as of March 7, 2003. In addition to further expanding the GT-B superfamily to include at least 20 CAZy families, we delineate 2 other large monophyletic superfamilies of glycosyltransferases (Fig. 1 [triangle]). One superfamily is the largest of all known GT superfamilies, including 22 CAZy families, as well as a large group of nucleotidyltransferases, and is an extension of the previously defined GT-A family (Bourne and Henrissat 2001). The known three-dimensional structures of several members of this superfamily are all classified as a Rossmann-like nucleoside diphosphosugar transferase (NDS) fold (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.hj.A.html). The other newly defined superfamily includes eight CAZy families of integral membrane proteins, with distant sequence similarity and partial conservation of transmembrane topology. The alignment of the superfamily displays several conserved charged residues located in the extracytoplasmic loops, in which they are likely to play a role in substrate binding or catalysis. We propose to call the latter superfamily GT-C. Ubiquitous distribution of GT-A and GT-B in three domains of life suggests the ancient origin of both superfamilies.

Figure 1.
Monophyletic groups identified by sequence-similarity analysis of proteins included in the CAZy database. (Yellow shading) GT-A, (gray shading) GT-B, (blue shading) GT-C. Smaller superfamilies are indicated by green and by tan shading. (Orange shading) ...

Results and Discussion

Glycosyltransferase superfamily A (GT-A): Sequence-level comparison indicates that nucleoside diphosphosugar metal-dependent glycosyltransferases and nucleotidyltransferases share spatial structure and form a monophyletic superfamily

Our interest in the NDS transferases started from the analysis of the Fringe family of developmental regulators. Fringe is a glycosyltransferase that binds Notch receptors and acts in the Golgi complex, adding GlcNAc to an O-fucose linkage on specific EGF-like repeats in the extracellular domain of Notch, which in turn lowers Notch sensitivity to Serrate-like ligands and concomitantly raises its sensitivity to Delta-like ligands (Bruckner et al. 2000; Ju et al. 2000; Moloney et al. 2000; Munro and Freeman 2000). Glycosyltransferase activity of this protein (CAZy family GT-31) has been predicted by comparative sequence analysis, on the basis of similarity to bacterial glycosyltransferase Lex1 (CAZy family GT-25; Yuan et al. 1997). That similarity, however distant, extended along most of the lengths of the aligned proteins and suggested that the structure of GTs from GT-31 and GT-25 might also be similar. Conserved tripeptide DDD, preceded by a predicted β-strand, was reminiscent of the strand-DxD motif, already recognized in several other GT families (Saxena et al. 1995). Subsequent experiments confirmed that Fringe proteins were required for glycosylation of Notch, and that mutagenesis of the conserved aspartic acid residues abolished the GT activity (Bruckner et al. 2000; Ju et al. 2000; Moloney et al. 2000; Munro and Freeman 2000).

Functionally important, yet short amino acid motifs such as the DxD motif, may be signals of the common ancestry of the enzymes that share them, or could have evolved independently, that is, convergently, in different lineages of evolutionarily unrelated GTs. To understand the evolutionary relationships between GT families, one needs to distinguish between the above two scenarios. Sometimes, the similarity of the three-dimensional structures is taken as a further indication of the common origin of the two proteins sharing a short sequence motif. Structure of several glycosyltransferases with DxD motif is known and is basically the same, namely a Rossmann-like αβα three-layer with seven-stranded β-sheet of the 3214657 topology, in which strand 6 is antiparallel to the rest (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.hj.A.html). Some (but not all) nucleotidyltransferases, enzymes that are mechanistically close to GTs, also possess the same fold and variations of the functionally important DxD tripeptide (Blankenfeldt et al. 2000; Mosimann et al. 2001; Olsen and Roderick 2001). The problem with the structural argument, however, is that there is no rigorous statistical theory allowing one to distinguish between convergent and divergent three-dimensional structures by direct comparison of the atomic coordinates. In contrast, statistics of random versus nonrandom (evolutionarily relevant) sequence matches is well understood (Karlin and Altschul 1990). Thus, the strongest available support for the common origin and divergent evolution of the two three-dimensional protein structures comes from matching their sequences in the context of large sequence databases (Aravind and Koonin 1999; Copley and Bork 2000; Nagano et al. 2002). Therefore, we obtained the statistics of matching all GT sequences to each other, using the nonredundant protein sequence database at NCBI as the search space.

Defining of the evolutionary ancient, diverse sequence superfamilies requires confident detection of remote homologs. It has been noted that the maximal number of such homologs is recovered when a number of distant members of the already defined family have to be tried as starters in the PSI-BLAST similarity searches, perhaps because this helps to overcome the effect of random fluctuations in sequences and in match scores (Aravind and Koonin 1999; Wrabl and Grishin 2001). Sought in this approach, is a set within which every sequence matches at least one other sequence with a score higher than (or probability lower than) a cutoff, without a requirement for every pair of sequences to pass this threshold. The image of such a group is a network in which the edges correspond to the matches satisfying the threshold requirement, and nodes may represent either single sequences or groups of sequences with very high degree of similarity (Fig. 2 [triangle]).

Figure 2.
Statistical significance of sequence similarities between CAZy members that belong to GT-A. Labels on the arrows are PSI-BLAST E-values upon first-time passing the profile-inclusion threshold. Broken lines indicate marginal statistical significance ( ...

We initiated the NR database search with GTs from the Fringe family, and ran PSI-BLAST to convergence, selecting diverse, yet statistically significant matches, starting new rounds of search, collecting novel homologs, and using them as queries until no new sequences could be detected. In addition, we searched the NR database with all protein sequences of known three-dimensional structure belonging to the NDS fold (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.hj.A.html). At a later stage, representative members of all remaining GT families were also analyzed in this manner. The result of this analysis, shown in Figures 1 and 2 [triangle] [triangle], is that at least 20 CAZy families appear to be related, at the level of statistically significant sequence similarity, to the GT-A proteins. A few more families tentatively belong to the same superfamily (see Fig. 2 [triangle] legend for details).

The lowest level of organization in the network shown in Figure 2 [triangle] is occupied by well-conserved, fully linked protein families, with the average similarity score, s = 240, and probability of random match in the context of the NR search ranging from 1e−15 to 1e−60. These families correspond almost precisely to the CAZy families (CAZy families GT-2, GT-8, GT-25, and GT-45 are, however, grouped together in our approach, and GT-31 is split into three groups). Important, although not crucial for our definition of extended GT-A superfamily, was the CAZy family GT-2, which includes bacterial spore coat polysaccharide biosynthesis protein SpsA with known structure (PDB entry 1QGQ). This family is connected to the largest number of other CAZy families. Inspection of the network allows one to define smaller connected groups, occupying an intermediate position between a CAZy family level and the GT-A superfamily level. One such group, represented by families GlmU, RmlA, sialic acid activating synthetase, cytidylytransferase, and MobA at the bottom right corner of Figure 2 [triangle], is interesting in that it contains almost all GT-A proteins with nucleotidyltransferase and other non-GT activities. In addition to this functional distinction, members of this cluster appear to share a distinctive structural feature (see below).

In total, there are >5000 members of GT-A in the NR database. They are found in all completely sequenced genomes, in numbers varying from only four in Mycoplasma genitalium and in Chlamydia spp. (the latter are the only group of species that has only GT-A nucleotidyltransferases, but no GT-A glycosyltransferases), to 33 members in a large bacterium such as Escherichia coli strain O157 and 59 homologs in Drosophila melanogaster. Thus, the GT-A superfamily appears to be one of the most commonly used protein superfamilies and spatial folds in cellular life, making the list of 20 most common folds in both Drosophila and E. coli, and the 10 most common folds (accounting for about 1% of all ORFs) in bacterium Helicobacter pylori. Curiously, surveys of fold usage in complete genomes, such as Parts List at Yale University (http://bioinfo.mbb.yale.edu/partslist/), ignore it almost completely, perhaps relying on the earlier releases of the SCOP database, in which the GT-A proteins (the NDS fold, as it is currently known in SCOP) were not classified. In the existing collections of conserved protein families, such as PFAM (Bateman et al. 2002), PRODOM (Servant et al. 2002), and PROTOMAP (Yona et al. 2000), GT-A is split into many families similar to the CAZy families, and nucleotidyltransferase families show no connection with glycosyltransferases. The remote, yet statistically significant sequence similarities, strongly indicative of the monophyletic origin of GT-A that we report here, have apparently not been fully appreciated until now.

Conserved sequence motifs in GT-A, their structural basis and functional significance

There are three regions of sequence conservation shared by all members of the GT-A. Taken together, these sequence regions comprise a substantial part of the structural core of these enzymes. We therefore describe that core, taking into account all 12 families recognized within the NDS fold in the SCOP database, and then discuss the universally conserved sequence motifs in more detail.

The three-dimensional structure of the GT-A members is based on the Rossmann-like fold, one of the most common arrangements of protein spatial structure, observed in dozens of diverse families of enzymes (Lesk 1995). In the most basic arrangement, extended β-stranded and α-helical regions alternate along the length of the protein, with all strands forming a central relatively planar β-sheet, and helices filling two layers, one on each side of the plane. As with many other Rossmann-like folds, the amino-terminal β-strand of the GT-A proteins is located in the middle of the sheet, and the strand topology is 321465, although the seventh strand may be added (Fig. 3 [triangle]). One of the strands in some Rossmanoids (the sixth in the case of GT-A proteins) is antiparallel to all other strands. Yet another typical feature of Rossmanoid enzymes is that the functionally important, conserved residues are often located in the carboxy-termini of the β-strands or in the adjoining loops (Lesk 1995). A subset of the GT-A proteins, consisting of the structures from the bottom right cluster in Figure 2 [triangle], conform to this plan almost precisely, with an occasional addition of an extra α-helix or a few β-strands, which, however, seem to be involved mostly in auxiliary roles such as multimerization (Blankenfeldt et al. 2000; Jelakovic and Schulz 2001; Fig. 4E,F [triangle]).

Figure 3.
Topology diagram of the NDS fold adopted by the GT-A proteins. Orange arrows indicate β-strands (1–7) forming the main β-sheet of the Rossmann-like fold. Blue cylinders indicate the most conserved α-helices. Magenta arrows ...
Figure 4.
Representative structures of the GT-A proteins with NDS fold. (A) Spore coat polysaccharide biosynthesis protein SpsA (PDB 1QGQ, SCOP NDS family 1 [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.hj.b.b.A.html]); (B) Lactose synthase (PDB 1J8X, SCOP ...

An elaboration of this structure that is observed in many families of GT-A superfamily consists of a specific addition of β-strands, which form an extra β-hairpin or a small β-sheet adjacent to the Rossmann-like core at an angle to the main β-sheet (Figs. 3, 4A [triangle] [triangle]–D), close to the residues playing catalytic role. We call this formation β-lip. One of the strands producing β-lip always follows the fourth strand of the Rossmann core and the adjoining DxD signature (Fig. 3 [triangle], and see below). The other strand, sufficient for formation of a two-stranded lip, is typically located to the carboxyl terminus of the Rossmann-core strand 7; this order is only permuted in β-1,4 galactosyltransferase family (e.g., PDB entry1J8X; Fig. 4B [triangle]). The lip can be enlarged by additional β-strands (typically not more than three; Fig. 4A,C [triangle]). Among the enzymes with β-lip, several families contain additional α-helices and β-hairpins, which again appear to play the auxiliary roles with regard to the catalytic reaction. Curiously, at the sequence similarity level as well as from the structural point of view, proteins lacking β-lip appear to form a more compact group than the ones with β-lip, despite the fact that the former group includes the diverse range of GT-A enzymes, including all of those that are not technically glycosyltransferases (Fig. 2 [triangle]).

The first conserved sequence region in the GT-A enzymes is closest to the amino terminus and encompasses β-strands S1–S3, their carboxy-terminal loops, and connecting α-helices (Fig. 5 [triangle]). The highly conserved residues are in the strands (mostly hydrophobic side chains) and in the loop after S1 (mostly charged side chains). Analysis of those crystal structures that contain a bound nucleotide indicates that the charged residues in the loop after S1 (S1L) and, in some cases, also in loops after S2 or S3, are making direct contacts with nucleotide base and/or sugar. The presence of glycine residues in S1L has been detected in several cases, and a broad similarity of this loop to the glycine-rich loops of the Walker-type NTP-binding sites (Saraste et al. 1990) has been discussed (Koonin 1995; Kinoshita et al. 1999; Lake et al. 2000). As seen in Figure 5 [triangle], there are a few partially conserved glycine residues in S1L, but this loop is, in most cases, not glycine rich. More importantly, the P-loops in Walker-type NTPases bind to the phosphate moiety of the bound nucleotide, not to the base or sugar as in the present case, so there is probably no evolutionary or functional connection between the two types of loops.

Figure 5.
Multiple sequence alignment of GT-A members. Every GT-A member represented in the genomes of Drosophila melanogaster (DM) and Escherichia coli (EC) is shown, as well as sequences from other species with known three-dimensional structures. Species abbreviations: ...

The second conserved sequence region consists of S4 followed by the DxD signature in a short loop (S4L; Figs. 4 and 5 [triangle] [triangle]). Conservation of this tripeptide is not absolute, as it is aligned to DDD, Dxx, or xxD in subsets of the GT-A. Divalent cation, most commonly Mg2+ or Mn2+, is present in many crystals, and this ion invariably contacts one or two of the acidic residues in the S4L, as well as the phosphate group(s) of bound nucleotide. The reaction of glycosyl transfer requires this bound metal ion, and does not occur when the acidic residues are mutated (Saxena and Brown 1997; Shibayama et al. 1998; Wiggins and Munro 1998; Hagen et al. 1999).

In a subset of NDS-fold proteins that lack β-lip, S4L is followed by an α-helix. When a β-lip is present, S4L is followed by the strand A of the lip. From this point to the carboxy-termini of the proteins, there is considerable sequence diversity and, accordingly, the structure of the rest of the GT-A superfamily is less well conserved (Fig. 3 [triangle]), except for the two helices following S6. These elements, connected by a loop, form the last conserved sequence region (Fig. 4 [triangle]), characterized by partial preservation of charged residues within each of the two helices. Whenever the bound sugar, modeling the transferable moiety of the nucleoside diphosphosugar donor, is present in crystals, it makes direct contacts with these charged residues in the helical region.

Thus, all three conserved sequence regions in GT-A appear to have structural roles, namely stabilization of the Rossmann-like core, and functional significance, namely interaction with a bound donor of a sugar and participation in divalent cation-mediated glycosyl transfer. None of the macromolecular sugar acceptors have been cocrystallized with a GT-A enzyme, but it is conceivable that the carboxy-terminal halves of the GT-A proteins, so variable at both sequence and structure levels, mediate interactions with the diverse substrates of these enzymes.

GT-A and GT-B have, respectively, duplicated and stand-alone Rossmann-like fold, but lack discernible sequence similarity

Probabilistic modeling and iterative database searches allowed us to discover deep evolutionary relationship between very diverse enzymes forming the GT-A superfamily. Recently, a conceptually similar study resulted in extension of the GT-B superfamily, another group of GTs with subtle sequence similarity, structure that is best described as duplicated Rossmann-like fold, and broad distribution in all living species (Wrabl and Grishin 2001). We were interested in the fact that no GT-A sequences showed up in searches initiated by GT-B family members or vice versa. Therefore, we explored more permissive cutoffs and different scoring schemes in our searches. We also attempted to force local alignments between GT-A and GT-B, using several approaches. First, we created specialized databases, consisting only of members of either superfamily, and used each database as the search space, interrogating it with members of other superfamily. Second, we attempted to match profiles to families and vise versa using the HMMer package (Eddy 1998). Third, we used the prof_sim program (Yona and Levitt 2002; G.Yona, pers. comm.) that matches PSI-BLAST checkpoint profiles directly. Finally, we created conserved sequence blocks from the multiple alignments of both superfamilies and used the LAMA server (Pietrokovski 1996) to compare them with the databases of other conserved sequence blocks. None of these approaches resulted in detection of sequence similarities between GT-A and GT-B. Even in a forced comparison of two superfamilies to each other, the matches did not correspond to regions conserved in a majority of members of each superfamily, nor did they highlight any residues with known structural or functional significance.

GT-A and GT-B thus appear to be unrelated, despite the similarity of their spatial folds and general ease with which duplication is believed to occur in evolution of sequences (Andrade et al. 2001) and structural domains (Heringa and Taylor 1997). It is notable that, whereas both β-sheets in GT-B have a very common strand topology (parallel 321456 arrangement found in 12 folds in the SCOP database, mostly, but not exclusively, Rossmanoids), the GT-A topology (321465, with antiparallel sixth strand) seems to be unique. Other properties setting the two families apart include (1) inconsistent presence of metal ions in GT-B and lack of conserved contacts between metal ions and side chains of acidic residues; (2) pyridoxal-dependent mechanism of glycosyl transfer in at least one member of GT-B, glycogen phosphorylase (Klein et al. 1986; Withers et al. 2002); and (3) complex mode of GT-B interaction with sugar donors, involving both Rossmann-like domains and requiring their motions (Ha et al. 2000; Morera et al. 2001). Combination of these structural and functional differences together with the apparent lack of global or local sequence similarity strongly indicates that these two superfamilies have evolved toward their similar molecular function independently, or at least, that their last common ancestor, if it existed, is extremely ancient and not amenable to sequence-based reconstruction.

Several proteins with well-documented roles in the development of multicellular organisms belong to GT-A and GT-B. Drosophila Fringe/Brainiac family and Egghead protein belong to GT-A, Fukutin belongs to GT-B and Exostosin/Tout-velu family, conserved in bacteria and multicellular eukaryotes, to contain both GT-A and GT-B-related domains, with multiple GT activities in one protein (at the time of the revision of this manuscript, the two-domain structure of exostosin has been already reflected in the CAZy database by addition of the family GT-64). A three-dimensional model of Fringe and multiple-sequence alignments of other proteins are available online as Supplemental information.

GT-C superfamily: Integral membrane glycosyltransferases with modified DxD motif in the first extracytoplasmic loop

The GT-C superfamily links 8 CAZy families (Fig. 6 [triangle]). All of these families consist of large hydrophobic proteins located in ER or on the plasma membrane (Strahl-Bolsinger et al. 1993; Takahashi et al. 1996; Maeda et al. 2001), with 8 to 13 predicted multiple transmembrane domains. In Figure 7 [triangle], sequence-similarity information and the alignment of the conserved amino-terminal extracytoplasmic loop in GT-C sequences, the best-preserved sequence element in all eight families, are shown. The conserved element that long loop is a modified DxD signature, aligned to ExD, DxE, DDx, or DEx residues. The role of this motif in an important human enzyme, dolichol-phosphate-mannose-dependent mannosyltransferase (GPI-MT-I) has been studied (Maeda et al. 2001). GPI-MT-I is essential for the formation of glycosylphosphatidylinositol (GPI), which substitutes the carboxyl terminus of many newly synthesized cell-surface proteins and serves as their membrane anchor (Orlean 1990; McConville and Menon 2000). Change of either of the aspartic acid residues in the DxD motif to alanine abolishes the mannosyltransferase activity of GPI-MT-I (Maeda et al. 2001). The mechanistic basis of catalysis in GT-C enzymes is unclear, and it is not known whether the conserved acidic residues in the putative DxD motif bind a divalent cation.

Figure 6.
Statistical significance of sequence similarities between CAZy members that belong to GT-C. Designations are as in Figure 2 [triangle].
Figure 7.
Multiple sequence alignment of GT-C members. Species abbreviations: (Mm) Mus musculus, (Os) Oryza sativa, (Dm) Drosophila melanogaster, (Sp) Schizosaccharomyces pombe, (Sc) Saccharomyces cerevisiae, (Tb) Trypanosoma brucei, (Hs) Homo sapiens, (At) Arabidopsis ...

Close to the carboxyl-terminal end of the same loop, there is another conserved acidic residue (Fig. 7 [triangle]). Partially conserved, positively charged residues have been detected in other extracytoplasmic loops (data not shown). These conserved sites may also be involved in catalysis, or in interactions with the sugar donors and acceptors, and are good targets for site-directed mutagenesis studies.

There is no evidence of the common evolutionary origin of the putative DxD loop in GT-C and the DxD motif in GT-A. The DxD tripeptide in GT-C family is located at the carboxy-terminal end of the first transmembrane helix, and is often followed by a small patch of hydrophobic amino acids, which are predicted to be part of the same extracellular loop. This arrangement is reminiscent of the DxD followed by a β-strand in a subset of GT-A enzymes, but there is no specific, statistically significant sequence similarity between these regions in GT-A and GT-C that could be detected in the context of the NR database search.

GT-C has more limited phyletic distribution than GT-A and GT-B. Representatives of GT-C are found in all completely sequenced eukaryotic genomes, but are missing from archaea, and rare findings of these enzymes in prokaryotes are limited to parasitic mycobacteria. The list of known biochemical specificities of the GT-C enzymes is also no match with an incredible variety of specificities in GT-A and GT-B, as most GT-C enzymes are only known to synthesize the polysaccharide derivatives of dolichol phosphate. Thus, GT-C appears to be a specialized, evolutionarily recent group of GTs that may have acquired the DxD motif either by convergent evolution, or by a recombinational graft of a short sequence in the extracellular loop of a membrane protein.

Concluding remarks: Trends in glycosyltransferase sequence evolution

In this work, we have delineated three large, diverse, most likely monophyletic superfamilies of glycosyltransferases, which together account for >75% of the families recognized by the CAZy database. In addition to the bona fide glycosyltransferases, GT-A and GT-B sequence superfamilies contain enzymes that utilize activated sugars as substrates, but have distinct enzymatic activities. This observation underscores the notion of the complex interplay of divergence and convergence in enzyme evolution, in which, on the one hand, similar sequences may have different enzymatic specificities, even belonging to different EC classes, and, on the other hand, enzymes with very similar specificities may have dramatically different sequences (Galperin et al. 1998; Copley and Bork 2000; Nagano et al. 2002).

Among the remaining 15 CAZy families, a few smaller superfamilies can be recognized. In at least one case (CAZy family GT-36), the fold can be predicted and appears to be different from both GT-A and GT-B (Fig. 1 [triangle]; A. Mushegian, unpubl.). Glycosyltransferase activities have apparently evolved independently on several occasions on the basis of different sequences and structures (two distinct Rossmanoids, as in GT-A and GT-B; integral membrane proteins, as in GT-C; and possibly other conserved domains; Fig. 1 [triangle]). GT-A and GT-B, however, stand out as the most diverse and ubiquitous groups of glycosyltransferases, found in all life forms with the exception of a few small parasitic bacteria with reduced biosynthetic capacity. Reconstruction of detailed phylogeny of both GT-A and GT-B is, however, hampered by insufficient signal retained in multiple alignments of each superfamily, by a very small number of shared derived characters on which to base a cladistic classification (Aravind et al. 2002a)—although a β-lip appears to be a good candidate for one such character within GT-A—and by apparently frequent horizontal transfer of biosynthetic enzymes, including GTs, in the early evolution of bacteria and archaea (Koonin et al. 1997, 2001). Analysis of phyletic distribution of both superfamilies and phylogenetic distances between individual sequences seems to indicate ancient origin of both GT-A and GT-B (A. Mushegian, unpubl.; see http://www.ncbi.nlm.nih.gov/cgi-bin/COG/palox?txt=cog0463 for more information on a clearly monophyletic, ubiquitous subset of GT-A). On the basis of phyletic distribution, we speculate that GT-A-like and GT-B-like Rossmanoids might have been present in the Last Universal Common Ancestor (LUCA) of all present-day life forms.

The odyssey of the β-DxD-β metal-binding motif in the active center of glycosyltransferases and nucleotidyltransferases is of special interest. The LUCA had to possess a nucleotidyltransferase involved in copying its genomic nucleic acid; it has been proposed that such copying proceeded via an RNA intermediate (Leipe et al. 1999), and that said nucleotidyltransferase more than likely was of the palm-domain ferredoxin-like fold, consisting of four- or five-stranded β-sheet with two α-helices packed against one side (Aravind et al. 2002b). A motif with 1–3 acidic residues, flanked by two β-strands, and involved in coordination of two nearby metal ions, is found in all palm-related domains, including RNA-dependent RNA polymerases (viral type), reverse transcriptases, β-family DNA polymerases, poly(A) polymerases, kanamycin nucleotidyltransferases, adenylyl cyclases, and GDDEF proteins (Aravind et al. 2002b). A few of the β-family polymerases and adenylyl cyclases (e.g., the evolutionarily more recent members of the palm-domain family) have this motif in the DxD form, and, curiously, they are even seen, although at a statistically insignificant level, in PSI-BLAST searches with GT-A enzymes. The occurrence of DxD between two strands in the ferredoxin fold may be an indication of independent, perhaps the most ancient, cooptation of this motif for nucleotidyltransferase function, followed in evolution by formation of β-lip in some of the GT-A and emergence of the first extracellular loop in GT-C.

On a more practical note, we have identified the regions of subtle sequence conservation, and their structural and functional correlates, in a number of proteins with important biochemical and developmental roles. Investigation of diverse biochemical processes of consequence to human health, from bacterial cell wall biosynthesis to hereditary multiple exostoses, can now proceed via mechanistic studies of precisely mutated glycosyltransferases of GT-A, GT-B, and GT-C superfamilies.

The approach to classification used in the CAZy database, the authoritative source of information about glycosyltransferases, is to capture both evolutionary divergence, in the form of groups on the basis of high-sequence similarity, and functional variation, in the form of separating retaining and inverting enzymes, even if their sequences are similar. Current classification is based only on sequence relationships, and in many cases, the distance between the related sequences is too high for extrapolation of the exact mechanism of glycosyl transfer. As more information comes from the mechanistic studies of various glycosyltransferases, sequence-based computational prediction of inverting versus retaining mechanism in these enzymes may also become plausible.

Materials and methods

Database searches

The PSI-BLAST (Altschul et al. 1997) program and the NR protein database from NCBI were used for the protein sequence database searches, with default parameters, except that the cutoff for inclusion into probabilistic model (-h parameter) was typically set at 0.02, and composition-based statistics (-t parameter; Schaffer et al. 2001) was not always used. Sequences detected in PSI-BLAST searches were collected into a database that was exhaustively matched to itself and clustered into single-linkage groups on the basis of the gapped BLAST pairwise distances of 75 or more, using the grouper program from the SEALS (Walker and Koonin 1997) package. Representative sequences from each cluster were used as queries in the new rounds of PSI-BLAST search, until no new sequences could be detected. Local versions of Hidden Markov Models were built with the hmmbuild program to scan the database with the hmmsearch program from the HMMer (Eddy 1998) package.

Multiple sequence alignment

Related sequences with a high degree of similarity (>50% identity along the entire lengths) were aligned using the T-Coffee (Notredame et al. 2000) program. For families with moderate similarity, representative sequences were aligned with the MACAW (Schuler et al. 1991) program. Multiply aligned families were aligned together, if there was statistical evidence of their distant relationship (see text), using T-Coffee or the profile-to-profile alignment option of the CLUSTALX (Thompson et al. 1997) program. Local similarities between checkpoint profiles generated in the course of PSI-BLAST searches were also investigated using the prof_sim (Yona and Levitt 2002) program. Correct alignment of the sequence motifs was sometimes checked by inspection of the pairwise matches in the PSI-BLAST outputs.

Secondary structure

Prediction of the secondary structure was performed using the JPRED metaserver (Cuff et al. 1998; Bujnicki et al. 2001) and by the PHD algorithm (Rost 1996). In the latter case, only prediction with the accuracy of seven or more was considered. Transmembrane segments were predicted by PHD and by HMMTOP (Tusnady and Simon 2001) program.

Structure modeling

Information on sequence conservation and predicted secondary structure of the fruit-fly Fringe gene product were used to model and evaluate its three-dimensional structure by the WHATIF (Vriend 1990) package. The PDB structures 1FOA and 1G8O served as the templates.

Electronic supplemental material

List of proteins, with Genbank identifiers, included in GT-A and GT-C superfamilies, are in GTA.txt, GTC.txt. Alignments of selected GT-A and GT-B family members are in fringe.doc, fukutin.doc, and the coordinates of the three-dimensional model of Drosphila Fringe is in fringemodel.pdb.

Acknowledgments

We thank V. Panin for asking about the Fringe family, L. Aravind for early discussions of sequence relationships of the exostosin superfamily, G. Yona for the prof_sim program, and an anonymous reviewer for many suggestions on improving this manuscript.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.

Notes

Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0302103.

References

  • Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402. [PMC free article] [PubMed]
  • Andrade, M.A., Perez-Iratxeta, C., and Ponting, C.P. 2001. Protein repeats: Structures, functions, and evolution. J. Struct. Biol. 134 117–131. [PubMed]
  • Aravind, L. and Koonin, E.V. 1999. Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J. Mol. Biol. 287 1023–1040. [PubMed]
  • Aravind, L., Anantharaman, V., and Koonin, E.V. 2002a. Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: Implications for protein evolution in the RNA. Proteins 48 1–14. [PubMed]
  • Aravind, L., Mazumder, R., Vasudevan, S., and Koonin, E.V., 2002b. Trends in protein evolution inferred from sequence and structure analysis. Curr. Opin. Struct. Biol. 12 392–399. [PubMed]
  • Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L, Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30 276–280. [PMC free article] [PubMed]
  • Bellaiche, Y., The, I., and Perrimon, N. 1998. Tout-velu is a Drosophila homologue of the putative tumour suppressor EXT-1 and is needed for Hh diffusion. Nature 394 85–88. [PubMed]
  • Blankenfeldt, W., Asuncion, M., Lam, J.S., and Naismith, J.H. 2000. The structural basis of the catalytic mechanism and regulation of glucose-1-phosphate thymidylyltransferase (RmlA). EMBO J. 19 6652–6663. [PMC free article] [PubMed]
  • Bourne, Y. and Henrissat, B. 2001. Glycoside hydrolases and glycosyltransferases: Families and functional modules. Curr. Opin. Struct. Biol. 11593–600. [PubMed]
  • Breton, C. and Imberty, A. 1999. Structure/function studies of glycosyltransferases. Curr. Opin. Struct. Biol. 9 563–571. [PubMed]
  • Breton, C., Bettler, E., Joziasse, D.H., Geremia, R.A., and Imberty, A. 1998. Sequence-function relationships of prokaryotic and eukaryotic galactosyltransferases. J. Biochem. 123 1000–1009. [PubMed]
  • Bruckner, K., Perez, L., Clausen, H., and Cohen, S. 2000. Glycosyltransferase activity of Fringe modulates Notch-Delta interactions. Nature 406 411–415. [PubMed]
  • Bujnicki, J.M., Elofsson, A., Fischer, D., and Rychlewski, L. 2001. Structure prediction meta server. Bioinformatics 17 750–751. [PubMed]
  • Campbell, J.A., Davies, G.J., Bulone, V., and Henrissat, B. 1997. A classification of nucleotide-diphospho-sugar glycosyltransferases based on amino acid sequence similarities. Biochem. J. 326 929–939. [PMC free article] [PubMed]
  • Cid, E., Gomis, R.R., Geremia, R.A., Guinovart, J.J., and Ferrer, J.C. 2000. Identification of two essential glutamic acid residues in glycogen synthase. J. Biol. Chem. 275 33614–33621. [PubMed]
  • Copley, R.R. and Bork, P. 2000. Homology among (βα)(8) barrels: Implications for the evolution of metabolic pathways. J. Mol. Biol. 303 627–641. [PubMed]
  • Cuff, J.A., Clamp, M.E., Siddiqui, A.S., Finlay, M., and Barton, G.J. 1998. Jpred: A consensus secondary structure prediction server. Bioinformatics 14 892–893. [PubMed]
  • Eddy, S.R. 1998. Profile hidden Markov models. Bioinformatics 14 755–763. [PubMed]
  • Galperin, M.Y., Walker, D.R., and Koonin, E.V. 1998. Analogous enzymes: Independent inventions in enzyme evolution. Genome Res. 8 779–790. [PubMed]
  • Ha, S., Walker, D., Shi, Y., and Walker, S. 2000. The 1.9 Å crystal structure of Escherichia coli MurG, a membrane-associated glycosyltransferase involved in peptidoglycan biosynthesis. Protein Sci. 9 1045–1052. [PMC free article] [PubMed]
  • Hagen, F.K., Hazes, B., Raffo, R., deSa, D., and Tabak, L.A. 1999. Structure-function analysis of the UDP-N-acetyl-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase. Essential residues lie in a predicted active site cleft resembling a lactose repressor fold. J. Biol. Chem. 274 6797–6803. [PubMed]
  • Heringa, J. and Taylor, W.R. 1997. Three-dimensional domain duplication, swapping and stealing. Curr. Opin. Struct. Biol. 7 416–421. [PubMed]
  • Jelakovic, S. and Schulz, G.E. 2001. The structure of CMP: 2-keto-3-deoxy-manno-octonic acid synthetase and of its complexes with substrates and substrate analogs. J. Mol. Biol. 312 143–155. [PubMed]
  • Ju, B.G., Jeong, S., Bae, E., Hyun, S., Carroll, S.B., Yim, J., and Kim, J., 2000. Fringe forms a complex with Notch. Nature 405 191–195. [PubMed]
  • Karlin, S. and Altschul, S.F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. 87 2264–2268. [PMC free article] [PubMed]
  • Kinoshita, K., Sadanami, K., Kidera, A., and Go, N. 1999. Structural motif of phosphate-binding site common to various protein superfamilies: All-against-all structural comparison of protein-mononucleotides. Protein Eng. 12 11–14. [PubMed]
  • Klein, H.W., Im, M.J., and Palm, D. 1986. Mechanism of the phosphorylase reaction. Utilization of D-gluco-hept-1-enitol in the absence of primer. Eur. J. Biochem. 157 107–114. [PubMed]
  • Koonin, E.V. 1995. Multidomain organization of eukaryotic guanine nucleotide exchange translation initiation factor eIF-2B subunits revealed by analysis of conserved sequence motifs. Protein Sci. 4 1608–1617. [PMC free article] [PubMed]
  • Koonin, E.V., Mushegian, A.R., Galperin, M.Y., and Walker, D.R. 1997. Comparison of archaeal and bacterial genomes: Computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol. Microbiol. 25 619–637. [PubMed]
  • Koonin, E.V., Makarova, K.S., and Aravind, L. 2001. Horizontal gene transfer in prokaryotes: Quantification and classification. Annu. Rev. Microbiol. 55 709–742. [PubMed]
  • Lake, M.W., Temple, C.A., Rajagopalan, K.V., and Schindelin, H. 2000. The crystal structure of the Escherichia coli MobA protein provides insight into molybdopterin guanine dinucleotide biosynthesis. J. Biol. Chem. 275 40211–40217. [PubMed]
  • Law, S.K.A. and Reid, K.B.M. 1995. Complement,2nd ed. IRL Press, Oxford, UK.
  • Leipe, D.D., Aravind, L., and Koonin, E.V. 1999. Did DNA replication evolve twice independently? Nucleic Acids Res. 27 3389–3401. [PMC free article] [PubMed]
  • Lesk, A.M. 1995. NAD-binding domains of dehydrogenases. Curr. Opin. Struct. Biol. 5 775–783. [PubMed]
  • Maeda, Y., Watanabe, R., Harris, C.L., Hong, Y., Ohishi, K., Kinoshita, K., and Kinoshita, T. 2001. PIG-M transfers the first mannose to glycosylphosphatidylinositol on the lumenal side of the ER. EMBO J. 20 250–261. [PMC free article] [PubMed]
  • McCarter, J.D. and Withers, S.G. 1994. Mechanisms of enzymatic glycoside hydrolysis. Curr. Opin. Struct. Biol. 4 885–892. [PubMed]
  • McConville, M.J. and Menon, A.K. 2000. Recent developments in the cell biology and biochemistry of glycosylphosphatidylinositol lipids. Mol. Membr. Biol. 17 1–16. [PubMed]
  • Moloney, D.J., Panin, V.M., Johnston, S.H., Chen, J., Shao, L., Wilson, R., Wang, Y., Stanley, P., Irvine, K.D., Haltiwanger, R.S., et al. 2000. Fringe is a glycosyltransferase that modifies Notch. Nature 406 369–375. [PubMed]
  • Morera, S., Lariviere, L., Kurzeck, J., Aschke-Sonnenborn, U., Freemont, P.S., Janin, J., and Ruger, W. 2001. High resolution crystal structures of T4 phage β-glucosyltransferase: Induced fit and effect of substrate and metal binding. J. Mol. Biol. 311 569–577. [PubMed]
  • Mosimann, S.C., Gilbert, M., Dombroswki, D., To, R., Wakarchuk, W., and Strynadka, N.C. 2001. Structure of a sialic acid-activating synthetase, CMP-acylneuraminate synthetase in the presence and absence of CDP. J. Biol. Chem. 276 8190–8196. [PubMed]
  • Munro, S. and Freeman, M. 2000. The notch signalling regulator fringe acts in the Golgi apparatus and requires the glycosyltransferase signature motif DXD. Curr. Biol. 10 813–820. [PubMed]
  • Mushegian, A.R. and Koonin, E.V. 1994. Unexpected sequence similarity between nucleosidases and phosphoribosyltransferases of different specificity. Protein Sci. 3 1081–1088. [PMC free article] [PubMed]
  • Nagano, N., Orengo, C.A., and Thornton, J.M. 2002. One fold with many functions: The evolutionary relationships between TIM barrel families based on their sequences, structures and functions. J. Mol. Biol. 321 741–765. [PubMed]
  • Notredame, C., Higgins, D.G., and Heringa, J. 2000. T-Coffee: A novel method for multiple sequence alignments. J. Mol. Biol. 302 205–217. [PubMed]
  • Olsen, L.R. and Roderick, S.L. 2001. Structure of the Escherichia coli GlmU pyrophosphorylase and acetyltransferase active sites. Biochemistry 40 1913–1921. [PubMed]
  • Orlean, P. 1990. Dolichol phosphate mannose synthase is required in vivo for glycosyl phosphatidylinositol membrane anchoring, O mannosylation, and N glycosylation of protein in Saccharomyces cerevisiae. Mol. Cell. Biol. 10 5796–5805. [PMC free article] [PubMed]
  • Persson, K., Ly, H.D., Dieckelmann, M., Wakarchuk, W.W., Withers, S.G., and Strynadka, N.C. 2001. Crystal structure of the retaining galactosyltransferase LgtC from Neisseria meningitidis in complex with donor and acceptor sugar analogs. Nat. Struct. Biol. 8 166–175. [PubMed]
  • Pietrokovski, S. 1996. Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Res. 24 3836–3845. [PMC free article] [PubMed]
  • Rost, B. 1996. PHD: Predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymol. 266 525–539. [PubMed]
  • Saraste, M., Sibbald, P.R., and Wittinghofer, A. 1990. The P-loop—A common motif in ATP- and GTP-binding proteins. Trends Biochem. Sci. 15 430–434. [PubMed]
  • Saxena, I.M. and Brown Jr., R.M. 1997. Identification of cellulose synthase (s) in higher plants: Sequence analysis of processive B-glycosyltransferases with the common motif ‘D, D, D35Q(R,Q)XRW‘ Cellulose 4 33–49.
  • Saxena, I.M., Brown Jr., R.M. Fevre, M., Geremia, R.A., and Henrissat, B. 1995. Multidomain architecture of β-glycosyl transferases: Implications for mechanism of action. J. Bacteriol. 177 1419–1424. [PMC free article] [PubMed]
  • Schaffer, A.A., Aravind, L., Madden, T.L., Shavirin, S., Spouge, J.L., Wolf, Y.I., Koonin, E.V., and Altschul, S.F. 2001. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29 2994–3005. [PMC free article] [PubMed]
  • Schuler, G.D., Altschul, S.F., and Lipman, D.J. 1991. A workbench for multiple alignment construction and analysis. Proteins 9 180–190. [PubMed]
  • Servant, F., Bru, C., Carrere, S., Courcelle, E., Gouzy, J., Peyruc, D., and Kahn, D. 2002. ProDom: Automated clustering of homologous domains. Brief Bioinform. 3 246–251. [PubMed]
  • Shibayama, K., Ohsuka, S., Tanaka, T., Arakawa, Y., and Ohta, M. 1998. Conserved structural regions involved in the catalytic mechanism of Escherichia coli K-12 WaaO (RfaI). J. Bacteriol. 180 5313–5318. [PMC free article] [PubMed]
  • Sinnott, M.L. 1991. Catalytic mechanisms of enzymic glycosyl transfer. Chem. Rev. 90 1170–1202.
  • Smit, A. and Mushegian, A. 2000. Biosynthesis of isoprenoids via mevalonate in Archaea: The lost pathway. Genome Res. 10 1468–1484. [PubMed]
  • Strahl-Bolsinger, S., Immervoll, T., Deutzmann, R., and Tanner, W. 1993. PMT1, the gene for a key enzyme of protein O-glycosylation in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. 90 8164–8168. [PMC free article] [PubMed]
  • Takahashi, M., Inoue, N., Ohishi, K., Maeda, Y., Nakamura, N., Endo, Y., Fujita, T., Takeda, J., and Kinoshita, T. 1996. PIG-B, a membrane protein of the endoplasmic reticulum with a large lumenal domain, is involved in transferring the third mannose of the GPI anchor. EMBO J. 15 4254–4261. [PMC free article] [PubMed]
  • Tarbouriech, N., Charnock, S.J., and Davies, G.J. 2001. Three-dimensional structures of the Mn and Mg dTDP complexes of the family GT-2 glycosyltransferase SpsA: A comparison with related NDP-sugar glycosyltransferases. J. Mol. Biol. 314 655–661. [PubMed]
  • Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. 1997. The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25 4876–4882. [PMC free article] [PubMed]
  • Tusnady, G.E. and Simon, I. 2001. The HMMTOP transmembrane topology prediction server. Bioinformatics 17 849–850. [PubMed]
  • Unligil, U.M. and Rini, J.M. 2000. Glycosyltransferase structure and mechanism. Curr. Opin. Struct. Biol. 10 510–517. [PubMed]
  • Vriend, G. 1990. WHAT IF: A molecular modeling and drug design program. J. Mol. Graph. 8 52–56. [PubMed]
  • Walker, D.R. and Koonin, E.V. 1997. SEALS: A system for easy analysis of lots of sequences. Intell. Syst. Mol. Biol. 5 333–339. [PubMed]
  • Wiggins, C.A. and Munro, S. 1998. Activity of the yeast MNN1 α-1,3-mannosyltransferase requires a motif conserved in many other families of glycosyltransferases. Proc. Natl. Acad. Sci. 95 7945–7950. [PMC free article] [PubMed]
  • Withers, S.G., Wakarchuk, W.W., and Strynadka, N.C. 2002. One step closer to a sweet conclusion. Chem. Biol. 91270–1273. [PubMed]
  • Wrabl, J.O. and Grishin, N.V. 2001. Homology between O-linked GlcNAc transferases and proteins of the glycogen phosphorylase superfamily. J. Mol. Biol. 314 365–374. [PubMed]
  • Yona, G. and Levitt, M. 2002. Within the twilight zone: A sensitive profile-profile comparison tool based on information theory. J. Mol. Biol. 3151257–1275. [PubMed]
  • Yona, G., Linial, N., and Linial, M. 2000. ProtoMap: Automatic classification of protein sequences and hierarchy of protein families. Nucleic Acids Res. 28 49–55. [PMC free article] [PubMed]
  • Yuan, Y.P., Schultz, J., Mlodzik, M., and Bork, P. 1997. Secreted fringe-like signaling molecules may be glycosyltransferases. Cell 88 9–11. [PubMed]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

Formats: