NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on Assessing the Importance and Impact of Glycomics and Glycosciences. Transforming Glycoscience: A Roadmap for the Future. Washington (DC): National Academies Press (US); 2012.

Cover of Transforming Glycoscience

Transforming Glycoscience: A Roadmap for the Future.

Show details

5The Toolkit of Glycoscience

The incredible opportunities for glycoscience in health, energy, and materials science described in the previous chapters and in the questions posed in Chapter 4 can only be realized with a set of new analytical tools. Today, glycoscience is practiced by a relatively small community of biologists, chemists, and materials scientists. This community must expand if glycoscience is to extend its impact and become pervasive. The broader scientific community must participate in the development of the necessary tools that will transform the field and empower researchers in both the glycoscience field and the larger scientific community to incorporate glycoscience into their research pursuits as a matter of course. To this end, glycoscience needs new analytical tools, including methods development for separation, purification, characterization, localization, and structure identification. The tools used today are limited in their capabilities and will not enable realization of glycoscience's full potential. The best analytical chemists and other measurement scientists, including tools developers, need to turn their attention to glycoscience and bring their creativity to the field. They need to apply existing tools and methods that have not yet been applied to glycoscience, and they need to develop new tools to solve analytical problems that existing tools cannot address.

Similarly, the synthesis community needs to begin to embrace glycochemistry as an essential field of organic chemistry. Glycochemistry needs to be brought into the mainstream of synthetic organic chemistry rather than kept as a specialized field practiced by only a handful of glycochemists. New synthetic methods need to be brought to bear on glycan synthesis, and the creativity of the entire synthesis community needs to be leveraged to solve the long-standing and vexing problems of stereoselective, regioselective syntheses with simple, high-yielding reactions. The biochemistry and genetics communities need to participate in identifying all enzymes and characterizing all pathways involved in glycan metabolism. Finally, computer scientists, modelers, and bioinformaticists need to be fully engaged. The community needs to set up glycoscience databases and integrate glycoscience into existing biological databases. Glycan and proteoglycan structure prediction and modeling tools need to be developed. Full interaction pathways must be developed to incorporate all aspects of glycobiology into systems biology. Details of these opportunities are described in the remainder of this chapter, but the main message is clear: Glycoscience needs the full participation of the broader scientific community to help develop tools that can solve some of the most vexing problems in glycoscience and to catalyze its integration into the scientific mainstream. By helping develop tools for glycoscience, it is expected that these tools will have follow-on benefits to all fields of science.

5.1. SYNTHESIS

5.1.1. General Aspects

The development of routine procedures for automated chemical synthesis of oligonucleotide fragments (DNA and RNA) and peptides has brought significant change to modern biology. Unfortunately, no general methods are available for the preparation of complex carbohydrates (Boltje et al. 2009; Kiessling and Splain 2010). As a result, the synthesis of a target is often a research project unto itself, which may take many months and in some cases years to complete. This problem is compounded by the fact that glycoconjugates in biological samples are often found in low concentrations and in microheterogeneous forms, greatly complicating their isolation and characterization. Glycomes of eukaryotic organisms are extremely diverse; for example, it has been estimated that the human glycome contains 10,000 to 20,000 minimal epitopes for glycan-binding proteins (Cummings 2009). Thus, robust synthetic technologies are urgently needed that can readily provide large collections of complex oligosaccharides. Furthermore, biological and analytical studies often require glycans to be modified by a tag, immobilized to surfaces, presented at a multivalent scaffold, or attached to a lipid, peptide, or protein (Seeberger and Werz 2007; Rich and Withers 2009). As a result, additional technologies are required that can readily provide such conjugates.

Current approaches for obtaining well-defined oligosaccharides and glycoconjugates include chemical synthesis, enzymatic and chemoenzymatic synthesis, and microbial production (Boltje et al. 2009; Kiessling and Splain 2010; Hsu et al. 2011; Schmaltz et al. 2011). The next sections cover the scope and limitations of these methodologies. Despite the shortcoming of these technologies, they have been instrumental in addressing a number of important problems in glycobiology research and for the discovery of vaccines and therapeutics. In particular, the Consortium for Functional Glycomics, funded by the National Institute of General Medical Sciences, has employed a chemoenzymatic approach for the preparation of a collection of approximately 600 glycans derived from N- and O-linked glycoproteins and glycolipids (Stevens et al. 2006; Rillahan and Paulson 2011). These compounds are modified with an artificial aminopropyl linker, which allows covalent attachment to N-hydroxysuccinimide-activated glass slides. The resulting microarrays have found wide utility for integrating binding specificities of a diverse range of glycan-binding proteins, determining dissociation constants and dissecting binding energies, and analyzing multivalent and hetero-ligand binding. The species-specific nature of the interaction between virus and host glycans and determination of ligand specificities of monoclonal antibodies have allowed use of glycan arrays in rapid assessment of influenza virus receptor specificity. A significant barrier to widespread use of glycan arrays, however, is the limited availability of well-defined oligosaccharides, and current arrays contain only a fraction of naturally occurring oligosaccharides. Also, very similar arrays displaying very similar glycans can, nevertheless, provide significantly different results with regard to GBP binding. There are exciting challenges ahead before glycan arrays can become a standardized method of analysis.

Development of a fully synthetic heparin fragment for treatment of deep vein thrombosis exemplifies the importance of the organic synthesis of glycans. Heparin and heparan sulfate are naturally occurring linear polysaccharides that are modified by sulfate esters. A heparin-derived pentasaccharide that can bind to antithrombin III (AT II) and that exhibits anticoagulant activity has been identified. A fully synthetic analog (fondaparinux) of this domain has been developed, which is being produced on a multikilogram scale to treat deep vein thrombosis (Petitou and van Boeckel 2004). In contrast to porcine mucosal tissue-derived heparin, the synthetic compound is easy to characterize and has a much-improved subcutaneous bioavailability. The importance of synthetic oligosaccharides for anticoagulation therapy was highlighted by the recent discovery of batches of heparin that caused hypotension and resulted in nearly 100 deaths. These reactions resulted from contamination with oversulfated chondroitin sulfate, which is a popular shellfish-derived oral supplement for the treatment of arthritis (Guerrini et al. 2008). The ability to synthesize pure, well-characterized glycans would eliminate the need to rely on poorly characterized and highly variable glycans obtained from natural sources.

Heparin and heparan sulfate are examples of glycosylaminoglycans (GAGs), which have been implicated in many other biological processes and can have pronounced physiological effects on lipid transport and adsorption, cell growth and migration, and development (Bishop et al. 2007). Significant changes in the structure of GAGs have been observed in the stroma surrounding tumors, which is noteworthy when considering tumor growth and invasion. GAGs are also involved in neurobiological processes and, for example, have been implicated in neuroepithelial growth and differentiation, neurite outgrowth, nerve regeneration, axonal guidance and branching, deposition of amyloidotic plaques in Alzheimer's disease, and astrocyte proliferation. Large arrays of well-defined heparan sulfate oligosaccharides are needed to identify compounds that mediate or inhibit these processes. It is possible that synthetic analogs of heparin may find application in the treatment of several neurological diseases, cancer, and infection.

Synthetic oligosaccharides have also been used in the development of vaccines for such diseases as Haemophilus influenzae type b, HIV, Plasmodium falciparum, Vibrio cholerae, Cryptococcus neoformans, Streptococcus pneumoniae, Shigella dysenteriae, Neisseria meningitides, Bacillus anthraces, and Candida albicans (Costantino et al. 2011; Morelli et al. 2011). Polysaccharides isolated from natural sources, which are conjugated to a carrier protein, are used in prevention of life-threatening bacterial infectious diseases such as meningitis and pneumonia. However, the wide utility of this approach is limited by such problems as the destruction of vital immunodominant features during the chemical conjugation to a carrier protein. Furthermore, isolated polysaccharides often display structural heterogeneity, which may complicate reproducible production. These compounds may also contain toxic components or immunosuppressive domains that may be difficult to remove. These problems can be addressed by using chemically or enzymatically synthesized glycan epitopes. In such an approach a synthetic oligosaccharide is equipped with an artificial spacer to facilitate selective conjugation to a carrier protein. In general, antibodies recognize epitopes no larger than a hexasaccharide, and compounds of this complexity can be readily obtained by organic synthesis. The recent approval of Haemophilus influenzae vaccine based on a synthetic glycan epitope highlights the potential use of organic synthesis for the development of glycoconjugate vaccines. The expansion of this capability to other vaccines would have a tremendous impact on both safety and efficacy and could potentially compress the timeframe for developing new vaccines, especially for new threats.

Synthetic oligosaccharides and glycopeptides are also being used in the development of cancer vaccines (Buskas et al. 2009). Oncogenic-transformed cells often overexpress oligosaccharides such as Globo-H, LewisY, and Tn antigen, and numerous preclinical and clinical studies have demonstrated that naturally acquired, passively administered, or actively induced antibodies that recognize glycan-associated tumor antigens are able to eliminate circulating tumor cells and micrometastases. The development of tumor-associated polysaccharides and glycopeptides as cancer vaccines has been complicated by the fact that they are self-antigens and therefore are tolerated by the immune system. The problem of self-tolerance is being addressed by the design, chemical synthesis, and immunological evaluation of fully synthetic vaccine candidates.

Synthetic oligosaccharides are also used for the preparation of glycopolymers, glycodendrimers, and glyconanoparticles (Garcia et al. 2010). These materials are receiving considerable attention because monovalent polysaccharides often exhibit weak affinities for their protein receptors. However, glycan-binding proteins often exist as higher-order oligomeric structures presenting multiple binding sites, acting as “polydentate” donors, and thereby circumventing the intrinsic weak binding interactions of monovalent ligands. Also, gold nanoparticles, quantum dots, and magnetic nanoparticles provide additional functionality as they allow detection by SPR or fluorescence or make it possible for convenient isolation by using a strong magnetic field.

An increasing number of drugs contain glycans or glycomimetics as a major component. Examples include many antibiotics, antiviral drugs such as Relenza and Tamiflu, hyaluronic acids, and selectin antagonists (Gantt et al. 2011b). Synthetic oligosaccharides are also being used in the preparation of well-defined glycoproteins. Approximately one-quarter of new approvals are protein-based drugs, with a majority being glycoproteins. The glycan moiety of glycoproteins plays an important role for its pharmacokinetic properties. Hence, it is critical to control the exact chemical composition of the oligosaccharide moieties of glycoproteins. Protein glycosylation is, however, not under direct genetic control and results in the formation of a heterogeneous range of glycoforms that possess the same peptide backbone but differ in the nature and site of glycosylation. In general, it is difficult to control glycoform formation in cell culture, which is a major obstacle for the development of therapeutic glycoproteins.

In summary, a diverse set of glycan structures can be used to discover specific inhibitors of glycosyltransferases with pharmaceutical applications, including diagnostics. Large arrays representing the diversity of glycans or focused on a specific set of glycan structures could be used for drug screening and discovery. Access to diverse glycans via synthesis for preparing such arrays would lead to new uses that have yet to be imagined.

5.1.2. Synthetic Tools

5.1.2.1. Chemical glycan synthesis

The realization that complex oligosaccharides and glycoconjugates are involved in numerous biological processes has stimulated development of chemical and enzymatic methods for the preparation of oligosaccharides. Unlike oligonucleotide and peptide synthesis, there are no general protocols for the preparation of glycans. As a result, the synthesis of specific targets is often a demanding and time-consuming task (Boltje et al. 2009; Kiessling and Splain 2010). However, recent technological advances are making it possible to streamline the process of oligosaccharide assembly and are providing opportunities to prepare collections of oligosaccharides and glycoconjugates.

The chemical synthesis of glycans involves coupling a fully protected glycosyl donor, which bears a leaving group at its anomeric center, with a suitably protected glycosyl acceptor that often contains only one free hydroxyl (Zhu and Schmidt 2009). The result of this chemical reaction is a glycoside product. The process of sequentially generating glycosyl donors and acceptors can be repeated until a complex target has been obtained. Preparation of monosaccharide building blocks requires extensive protecting-group manipulations, and typically 6 to 10 chemical steps are needed to create each building block. Because preparation of the monomer building blocks consumes the majority of effort invested in chemical glycan synthesis, rapid and inexpensive access to building blocks should greatly accelerate chemical glycan synthesis. One approach to speed up monosaccharide synthesis involves parallel combinatorial sequential one-pot multistep procedures for the selective protection of monosaccharides (Wang et al. 2007). This approach can incorporate as many as seven chemical steps, obviating the need to carry out intermittent tedious work-up and time-consuming purifications. A complementary approach involves identification of monosaccharide building blocks that can be used repeatedly for the synthesis of a wide range of target structures. For example, it has been proposed that 88 percent of glycoside motifs found in mammalian glycoconjugates could be constructed from only 20 different monosaccharide building blocks, which could then be prepared in bulk for widespread use in automated glycan synthesis technology (Werz et al. 2007). In addition, disaccharide building blocks have been identified that resemble the saccharide motifs found in heparan sulfate and that can be used repeatedly for rapid assembly of libraries of heparan sulfate oligosaccharides. This building block approach could be extended to many other classes of oligosaccharides, although because of the enormous structural diversity of natural glycans, it will be important to focus on glycomes of particular interest.

The success of the unified building block approach relies on the premise that each glycosylation will be high yielding. In practice, this has been shown to not be the case, and additional saccharide building blocks are required to address possible synthetic difficulties. Sets of orthogonal protecting groups are being developed that provide additional synthetic flexibility in that they offer the possibility to change the order of glycosylation. Stereoselective installation of glycosidic bonds in high yield is a major challenge in complex oligosaccharide synthesis. In recent years this aspect of oligosaccharide synthesis has progressed considerably, and a wide range of stable yet highly reactive anomeric leaving groups have become available, making it possible to examine several glycosylation protocols to achieve optimal results (Zhu and Schmidt 2009). By exploiting neighboring-group participation, steric and conformational effects, or direct displacement of leaving groups, glycosides can be obtained with high anomeric selectivity, even for the more challenging α-sialyl and β-mannosyl linkages. However, glycoside products are often contaminated by unwanted anomeric products, making it necessary to include time-consuming purification protocols.

Minimizing purification steps has been the focus of efforts to streamline chemical synthesis of oligosaccharides. Approaches that are being pioneered include one-pot multistep solution-phase glycan synthesis, solid-phase glycan synthesis, and fluorous tagging. One-pot multistep procedures are based on the sequential addition of glycosyl donors with well-defined anomeric reactivity to a reaction flask to provide an oligosaccharide without the need to purify synthetic intermediates (Kaeothip and Demchenko 2011). Although many variations of the one-pot strategy have been developed, there are three major concepts: chemoselective, orthogonal, and preactivation glycosylation strategies. In chemoselective glycosylation strategies, glycosyl donors with decreasing anomeric reactivity are allowed to react sequentially. Orthogonal glycosylations use glycosyl donors and acceptors that have different anomeric groups that can be activated without affecting each other. A flexible approach that exploits the advantages of the aforementioned strategies utilizes preactivation of a glycosyl donor to generate a reactive intermediate in the absence of the acceptor. After addition of the glycosyl acceptor, a glycoside product can be formed that has an identical leaving group at the reducing end. In the same reaction flask the process of anomeric activation and glycosylation can be repeated to construct complex oligosaccharides. Successful implementation of this strategy requires that the promoter be completely consumed to prevent activation of a subsequent saccharide building block. Furthermore, the reactive intermediate should be sufficiently long lived to permit addition of a glycosyl acceptor yet sufficiently reactive for a high-yielding glycosylation. It has been difficult to design glycosylations that meet these requirements.

Encouraged by successes with polymer-supported peptide synthesis, the first attempts at solid-phase oligosaccharide synthesis were reported in the 1970s. These efforts were not successful, largely because of a lack of efficient glycosylation methods. The past decade has seen a renewed interest in polymer-supported glycan synthesis, and different polymer support materials, linkers, synthetic strategies, and glycosylating agents have been explored (Seeberger 2008). However, a general solution for routine and automated oligosaccharide synthesis remains to be established, in large part because of the need for large excesses of glycosyl donors, the lack of anomeric control when 1,2-cis-glycosides need to be installed, the unpredictability of glycosylations, and the additional steps required for linker functionalization and protecting-group removal. However, progress is being made in these areas, bringing the promise of routine automated oligosaccharide synthesis closer to fruition. In particular, it has been shown that automated synthesis can provide complex oligosaccharides such as a branched β-glucan dodecasaccharide, blood-group oligosaccharides, and tumor-associated glycan antigens.

Solution-based strategies have been developed in which the growing oligosaccharide chain is modified by a tag that allows selective precipitation, extraction, or absorption for convenient purification. In particular, light fluorous tagging technology is attractive because it makes possible protecting-group manipulations and glycosylations under conditions typically used in solution-phase chemistry (Jaipuri and Pohl 2008). In this approach, tagged products can be selectively captured by a fluorous solid-phase extraction column and then released by elution with methanol. Fluorous-tagged glycans can also be directly printed on fluorocarbon surfaces, providing interesting opportunities for glycan array development. Efforts are under way to automate fluorous-supported synthesis of oligosaccharides with liquid handling devices.

Despite considerable progress, chemical synthesis of glycans remains a challenging endeavor that is practiced only by expert laboratories. The lack of large collections of universal building blocks and the need to optimize glycosylation conditions complicate routine synthesis of this class of compounds. There is an urgent need for the development of more reliable glycosylation protocols, which might be accomplished with a better understanding of the mechanistic aspect of glycosylations. Furthermore, new approaches need to be developed for controlling anomeric selectivities of glycosylations. In particular, reliance on extensive protection-deprotection schemes adds to the number of steps and reduces the yields of complex glycans. High-throughput protocols for the rapid evaluation of many reaction conditions—for example, by employing microfluidics devices—may provide more reliable glycosylation protocols. It is also to be expected that searchable databases of reported glycosylations can accelerate the optimization process and may lead to standardized protocols.

5.1.2.2. Enzymatic synthesis of glycans

Enzyme-catalyzed glycosylations offer an approach that is complementary to chemical synthesis for obtaining structurally well-defined oligosaccharides, polysaccharides, and glycoconjugates. Current enzymatic and chemoenzymatic approaches apply glycosyltransferases, glycosidases, glycosynthases, and trans-glycosidases to construct glycosidic linkages, whereas enzymes such as sulfotransferases, epimerases, and acetyltransferases have been used for postglycosylation modifications (Schmaltz et al. 2011). (See also Section 5.4.2.1, which discusses glycan synthesis in the context of glycoenzyme applications.)

Glycans are synthesized in an assembly-line manner by enzymes such as glycosyltransferases. In this process the product of one enzyme becomes the substrate of the next. The glycosyltransferases form a connection, termed the glycosidic linkage, between a growing glycan chain and another sugar building block. The most common building blocks for glycosyltransferases are called nucleotide sugar donors, and the structures to which those building blocks are added are generally referred to as glycosyl acceptors. In general, there is a unique glycosyltransferase for nearly every type of glycosidic linkage formed, and these enzymes are among the most specific enzymes known. They are able to distinguish the spatial orientation of a single atom, even on very large structures, within both their glycosyl donors and acceptors.

As a result, glycosyl transferases are essential enzymes for oligosaccharide biosyntheses. Their ability to transfer a sugar residue from a sugar-nucleotide mono- and di-phosphate to a maturing oligosaccharide chain (Lairson et al. 2008) and their highly regio- and stereoselective nature make them ideally suited for the preparation of complex oligosaccharides. Currently, more than 50,000 genes encoding potential glycosyltransferases have been identified, although only a very small number have been characterized. Many of these enzymes are membrane bound and glycosylated, making their isolation and utilization difficult. The number of glycosyltransferases with good catalytic activity and defined substrate specificity that can be expressed as a soluble form in large amounts is currently small, which hampers efforts to develop enzymatic glycoconjugate synthetic schemes. However, several activities are under way to address these problems. High-throughput assays are being developed to identify activities and substrate specificities of glycosyltransferases. Furthermore, it has been found that glycosyltransferases from bacterial sources often exhibit considerable substrate promiscuity, thereby offering unique opportunities for chemoenzymatic synthesis of glycan libraries and their derivatives. The substrate specificity of glycosyltransferases can be altered by protein crystal structure-based rational design or directed evolution. However, glycan-modifying enzymes are exceptionally underrepresented in structural databases, particularly as enzyme substrate complexes, and the resulting incomplete and fragmentary collection of biochemical and structural data on this class of enzymes has led to an incomplete understanding of the molecular mechanisms that control oligosaccharide biosynthesis. Recently, the Repository of Glyco-enzyme Expression Constructs was created to focus on generating expression vectors that encode all human glycosyltransferases and glycoside hydrolases as well as a limited set of glycan-modifying enzymes for production in bacteria, insect cells (baculovirus), and mammalian cells. The goal of the repository is to facilitate the production of soluble forms of enzymes as catalytic domains, when possible, for use in biochemical, enzymatic, and structural studies. Many of the constructs have been designed as truncated forms devoid of the transmembrane protein domain and linked to affinity tags or other larger fusion proteins to facilitate affinity purification and quantification.

Several convenient approaches have been developed for the preparation of sugar nucleotides, key substrates for all glycosyltransferases, including in situ sugar nucleotide regeneration, fusion protein strategies, one-pot multienzyme systems, and superbead technologies. Progress is being made in identifying and characterizing many sugar nucleotide biosynthetic enzymes, including those involved in salvage pathways. However, many uncommon sugar nucleotides for glycosyltransferase-catalyzed synthesis of glycosylated natural products are less accessible because of their much more complicated biosynthetic pathways and instability.

The combined use of glycosyltransferases, sulfotransferases, and epimerase has been successfully implemented for the synthesis of structurally defined heparin and heparan sulfate oligosaccharides, as well as for polysaccharides with specific sulfation patterns. In particular, these developments have led to the production of ultra low molecular weight heparins in 10 to 12 steps, with an overall yield of 40 to 50 percent (Xu et al. 2011). This compares well with the process now used to make the anticoagulant drug Arixtra, which involves some 50 steps and has an overall yield of less than 1 percent.

Currently, microfluidics and microarray formats are being explored to conduct enzymatic syntheses. These types of approaches can be used to create a wide range of products that can be analyzed using mass spectrometry and then assayed for biological activity. At the other extreme, work is being conducted to develop macroscale enzyme-assisted syntheses of heparin, although a major obstacle in this effort is the cost of a critical cofactor—3′-phosphoadenyl-5′-phosphosulfate (PAPS)—which donates the high-energy sulfate groups that are covalently attached to the heparin backbone to make bioactive heparan sulfate. One solution is to regenerate PAPs in situ from the byproduct 3′-phosphoadenyl-5′-phosphate (PAP) enzymatically, a process similar to that used in large-scale oligosaccharide synthesis with sugar nucleotide regeneration.

Unlike chemical synthesis, the synthesis of unnatural saccharide sequences is a challenge for enzyme-based methods as a result of the strict substrate specificities of most glycan-synthesizing enzymes. To overcome this limitation, additional studies of heparan sulfate biosynthetic enzymes are necessary, especially to advance our understanding of the substrate specificities and to conduct mechanistically based mutagenesis to engineer the specificities.

Glycosyl hydrolases are a class of enzymes that degrade oligosaccharides by cleavage of glycosidic linkages. The reverse hydrolytic activity of this class of enzymes can be exploited for glycosidic bond synthesis. This approach suffers, however, from relatively low yields because of the challenge of driving reactions in a thermodynamically unfavorable direction. This problem has been addressed by the introduction of “glycosynthases,” which are glycosidases rendered hydrolytically incompetent by replacement of a nucleophilic aspartic or glutamic acid with an alternative unreactive amino acid. Glycosynthases can, however, transfer activated glycosyl substrates that have the opposite anomeric configuration of the natural substrate. For example, β(1,4)-mannans, which are major plant cell wall polysaccharides, have been prepared by using a mutant endo-b-mannanase and an α-mannobiosyl fluoride as glycosyl donor. In addition, glycosynthases have been used for the preparation of β-linked glucuronic and galacturonic acid conjugates. Currently, only a small number of glycosynthases have been developed, which limits the scope of this technology. Recent studies have shown that the catalytic activities and substrate promiscuities of glycosynthases can be improved by directed evolution.

5.1.3. Manipulating Glycans by Pathway Engineering

All cells have endogenous machinery for synthesizing glycans and glycoconjugates. The glycan-modifying infrastructure in cells includes glycosidases, glycosyltransferases, mechanisms for activated sugar synthesis and transport, and supporting functions, which are under coordinated control to orchestrate the formation of precise glycan structures. The vast majority of glycans and glycoconjugates pose significant challenges to synthetic endeavors. As such the goal of pathway engineering approaches is to manipulate the biosynthetic power of cells to transform them into synthetic tools for the production of specific glycan products. Over the past 10 years, glycoengineering approaches have provided new routes to producing human glycoproteins with defined glycoforms by manipulating mammalian and other cellular glycosylation pathways, by creating glycoprocessing enzymes with novel properties, and by modifying nonmammalian systems. Successes include humanizing the N-glycosylation pathways of yeast, insect, and plant cell systems and introducing protein glycosylation pathways into E. coli. Glycoengineering efforts have also transformed bacteria and other microbes into factories for glycan production and natural product modification and into tools to study and thwart glycan-related pathogenic mechanisms. Key to these strategies is the manipulation and augmentation of the endogenous cellular glycoprocessing infrastructure with novel enzyme activity. This involves the engineering of several genes for biosynthesis, activation, transport, and transfer of mono- and oligosaccharides.

Mutation, knockout, and inhibition present possibilities for controlling or simplifying N- and/or O-linked glycosylation profiles in mammalian cell lines. Within this category, cell lines with genetic mutations in glycosylation pathways have been a valuable resource for studying glycobiology and continue to represent powerful tools for producing glycoconjugates with more tailored glycans. Although not technically glycoengineering, RNA interference and small-molecule inhibitors of glycosylation also can be used to knock down the activity of glycoprocessing enzymes and produce simplified glycan structures for further elaboration.

Nonmammalian systems demonstrate the power of domain engineering to provide novel catalytic activity in the secretory pathway. In yeast and plant systems, domain engineering has been critical to introducing the necessary mammalian-type glycoprocessing activity to generate mammalian N- and O-linked glycans, a process called “humanizing.” Metabolic glycoengineering of E. coli has been a successful method of producing human complex glycan structures, too. These schemes focus on generating the glycan component in the bacterial cytosol by using microbial glycoprocessing enzymes. Key steps include engineering an appropriate glycan acceptor scaffold that cannot be metabolically diverted and engineering appropriate glycoprocessing infrastructure to build on acceptor scaffolds, including glycosyltransfer enzymes and supporting nucleotide sugar synthesis and transport.

Over the past 10 years, two primary applications of metabolically glycoengineered E. coli have been demonstrated. In one application, E. coli are transformed into factories for the production of high-value glycans. In another, glycan structures are displayed on chimeric lipooligosaccharides at the bacterial cell surface, which can be used to study bacterial glycomimicry, to develop probiotics, and for drug screening. Both approaches demonstrate that biologically relevant glycan epitopes can be efficiently generated using very different engineered acceptor molecules, suggesting that further applications for metabolic glycoengineering can be developed through the engineering of appropriate acceptor molecules, perhaps even glycoproteins. Metabolically glycoengineered E. coli and cells are compatible with large-scale fermentation and can provide access to significant quantities of complex glycans, making them of interest for both research and clinical applications.

5.1.4. Synthesis of Standards for Mass Spectrometry

Quantification and characterization of complex mixtures of glycans are underdeveloped, complicating glycome and glycoproteome analysis of biological samples. The current practice for quantification is that individual components of glycan profiles generated by mass spectrometry are quantified relative to each other. While this normalized parameter has been broadly useful for comparing major changes in profiles across samples or biological conditions, its value for characterizing minor glycans is frequently subject to undue influence by the major components, as teasing numerator and denominator effects apart can be difficult for low-abundance but biologically significant glycans. Generating methods for absolute quantification by mass spectrometry would allow profile changes to be assessed glycan by glycan, independent of variations in the whole profile. A set of well-characterized oligosaccharide standards is essential to realize this potential. The availability of glycan standards that contain isotopic labels would add an additional level of utility and would make it possible to accurately identify and quantify glycans in complex biological samples.

In addition to facilitating glycan quantification, well-characterized oligosaccharide standards would provide substrates for elucidating fragmentation rules for multiple mass spectrometry schemes. The assignment of glycan structures is challenging because of the isobaric nature of glycans—that is, different glycan structures can have identical molecular weights. For instance, at the monosaccharide level, all hexoses have the same molecular weights, as do the corresponding HexNAc derivatives. For extended structures the sequences are often identical but with different branching patterns. Also, glycosidic linkages can have two anomeric configurations, which cannot be assigned by simple mass determination. It is, however, expected that this problem can be addressed by detailed mass spectrometry analysis of well-defined glycan standards. In this respect, isomers of oligosaccharides should produce unique fragmentation patterns that can be used for compound identification. A comprehensive analysis of a wide range of oligosaccharide standards will also create invaluable information for the bioinformatics community to build programs similar to the high-throughput software, such as the MASCOT and Sequest programs, now used to identify peptide sequences from proteomics data.

5.1.5. Key Messages on Glycan Synthesis

Well-defined complex oligosaccharides can be obtained by chemical synthesis, enzyme catalyzed reactions, and fermentation. Over the past 30 years, tremendous advances have been made in chemical and enzymatic synthesis of glycans. However, no routine synthesis processes exist for glycans. As a result, synthetic glycans remain relegated to small quantities and specialized laboratories, and current methods are not sufficiently robust to permit preparation of large collections of compounds for analytical and structure-activity relationship studies. Technology for the preparation of well-defined glycoconjugates, such as glycoproteins, is still in its infancy. For glycoscience to move ahead, significant further progress in synthesis will be needed. This progress is likely to include better methods for specific and selective glycosylation with less reliance on both protecting groups and linear strategies, the ability to obtain and characterize more enzymes with the requisite specificities for making any glycosidic linkage, and improved understanding of the genes involved in glycan synthesis, along with tools for engineering new pathways to make any desired glycan structure.

5.2. ANALYSIS

The “analysis” tools discussed in this report cover a broad range of chemical, physical, biological, computational, mathematical, and engineering techniques, albeit necessarily focused on applications to areas of glycoscience. However, many advances in analytical tools to dramatically improve current techniques are anticipated to draw on a broad base of expertise and talents, potentially from a wide-ranging group of scientists with markedly different backgrounds from those of current glycoscientists. Many unanticipated developments and novel ideas relevant to the areas of analysis detailed below may spring from scientists devoted to a variety of analytical methods outside what might now typically be used or even considered by glycoscientists. This report therefore stresses that involvement of the broader analytical community should be welcomed in the development of novel technologies to address current problems and shortcomings, as outlined below, and that participatory collaborations be fostered between the broader analytical community and glycoscientists.

Because many truly transformative new technologies may be difficult to envision or even imagine, it can be anticipated that the requirements for individuals with skills from varied backgrounds, the degree of overlap in their expertise, their numbers, the collaborative modes, and the resources required to implement interesting new techniques may be equally difficult to foresee. Thus, the field will need to be flexible about the ideas and the nature, size, and types of collaborations necessary to meet specific challenges, with the consideration that at least some may be potentially high-risk yet high-payoff developments.

The topic of analysis can be subdivided into components that focus on several key aspects:

  1. analysis of glycan molecules themselves, including techniques for their disassembly, separation, analysis of purity, and analysis of primary and three-dimensional structures;
  2. analysis of glycoconjugates, which includes analysis of the molecular components of glycans conjugated to other molecules such as peptides, proteins, and lipids, particularly for mucins, proteoglycans, peptidoglycans, and lipopolysaccharides;
  3. for many types of glycans, analysis of their interactions with proteins or higher-level structural interactions that occur in many types of cell walls;
  4. for some glycans, analysis of their roles in metabolic functions related to a cell or an organism's pathways of energy utilization;
  5. analysis of the relationship of glycan structures to the genome, which includes analysis of the enzymes that synthesize and degrade glycans—the glycosyltransferases and glycosidases, the precursor nucleotide sugars—in addition to understanding the phenotypic effects observed in cells/organisms as a result of their mutation and/or cell-specific genetic ablation; and
  6. analysis of the locations of specific glycan structures in cells or tissues of organisms through molecule-specific imaging techniques.

Ultimately, the goal of improving structural techniques, which is related to the first two subtopics, is to understand the roles of glycans in various biological processes, including their interactions (subtopic 3), how their synthesis and degradation are controlled (subtopic 4), for some glycans their roles in metabolism (subtopic 5), and where specific structures are expressed in organisms (subtopic 6). Each of these items is addressed in more detail in the following sections.

5.2.1. Analysis of Primary Glycan Structures

Perhaps the greatest advancement in accelerating the fields of genomics and proteomics was the development of accurate, sensitive, and rapid methods for determining the primary structures of these biopolymers. Indeed, the development of automated, enzyme-based sequencing technologies launched both genomics and proteomics as viable and productive fields of research.

The field of glycan sciences offers a different challenge because of the diversity of monomers across organisms and the inherent variations in the way these monomers are connected to one another. To begin with, sugar building blocks are not the same for many divergent groups of organisms. The monomers are isomeric and can include different ring sizes (five- or six-membered), anomeric configuration (α or β), absolute configuration (D or L), and aldoses versus ketoses (a carbonyl group at C-1 as opposed to C-2 or other positions). The monomers can also contain many variants of the basic sugar molecule, including many variants of deoxysugars (at different positions) and other substitutions of hydroxyl groups, with, for example, amino, sulfate, phosphate, acyl or alkyl functional groups (Schaffer 1972; Horton and Wander 1980; Williams and Wander 1980; Kremer and Gallo-Rodriguez 2004). In addition, sugars link to each other via glycosidic bonds in which the anomeric carbon (usually C-1 or C-2) of one sugar can potentially attach to any of the hydroxyl groups of another sugar. Moreover, each monomer can be linked to more than one other sugar, leading to the formation of branched polysaccharides.

Nucleic acid and protein analysis made major advances with the discovery of enzymes that selectively cleave large molecules into smaller ones, which can be more easily sequenced. Analogous enzymes, called “endoglycanases,” exist in nature, as do enzymes that selectively cleave glycans from proteins or lipids. Isolating these enzymes and harnessing their ability to selectively cleave glycans at specific linkages could provide a critical set of tools for glycan sequencing efforts. It may also be possible to develop selective chemical reactions, perhaps with site-specific catalysts, to facilitate the deconstruction of glycans into smaller polysaccharides or monosaccharides that could be more easily sequenced. In this regard the development of nanopore sequencing, in which linear DNA strands are translocated through a pore as the bases are read, portends a similar approach for linear glycans.

Once glycans are broken down into smaller pieces, it will be necessary to separate the resulting sets of polysaccharides and monosaccharides for analysis. These mixtures will contain not only molecules with different numbers of sugars but also many sets of isomeric molecules that might contain not only the same sugars with different linkages between them but also individual sugars that vary at a single stereochemical site. While current technologies, including high-performance liquid chromatography, are adept at separating polysaccharides that differ in the number or type of sugars, current technologies are not as successful at separating isomeric structures. Many other separation tools, such as immobilized glycan-binding protein columns, electrophoresis, and ion mobility separation of glycans as ions, among others, are being investigated, but today the separation and assessment of isomeric heterogeneity are the most time-consuming parts of structural determination.

The types of techniques likely to be needed in glycoscience include those that are “purely analytical” and those that are “analytical with the intent of providing structural proof of individual molecules.” Purely analytical techniques include highly sensitive, high-resolution, and multidimensional methods for rapid assessment of isomeric heterogeneity, as well as tools capable of separating molecules or providing high-resolution spectroscopic information to provide evidence of purity. Frequently, molecules may co-migrate using one separation system, and therefore co-migration in a single system is not a valid criterion for purity or proof of structure. Possible options might include systems that concatenate multiple separation modes and other tools that can discriminate isomeric glycan molecules.

Separation methods coupled with methods that provide structural information include automated multidimensional methods paired with tools for primary structural determination. Whether samples are analyzed online or fractions are collected and analyzed afterward, rapid, automated, multidimensional separations that can generate pure samples with high probability and that can be applied as broadly as possible to structures from animal, plant, bacterial, and fungal sources would be useful for the field. Current methods, such as collection of multiple fractions from one chromatographic or electrophoretic system, concentrating them, then applying the molecular components in these fractions to another chromatographic separation, followed by another concentration of many more fractions, applying to another column, and so forth, proves very laborious and time consuming. Such samples can then be subjected to tools that assess purity and establish primary structures.

Owing to the great diversity of natural structures in different taxonomic kingdoms, primary structural determination or “sequencing” takes on a completely different nature when quite possibly none of the same monomers might be found between two organisms from different kingdoms or even within the same kingdom (such as prokaryotes). Furthermore, there are probably many biological systems, unexamined as yet, in which novel monomers will be discovered. Glycan structures contain a large number of stereocenters, and detailed assignment of their structures is currently best achieved by a combination of methods that provide orthogonal structural information.

Many developments in techniques that can prove structure will be needed, which may include but need not be restricted to the following:

5.2.1.1. Methods for complete disassembly of larger glycans to their monomers and better tools for determination of monosaccharide structures

This includes all aspects of their structures: (1) confident assignment of their stereochemistry; (2) confident assignment of modifications at different hydroxyl positions, such as acyl or sulfate groups; (3) determination of their enantiomers (D- or L-forms); and (4) confident determination of whether sugar components are aldoses or ketoses. Better techniques to quantitate monosaccharides, with a wide availability of standards, are needed for a wide variety of monosaccharide compositional analyses in the future.

5.2.1.2. Methods for determination of sugar ring forms (five- or six-membered) in oligo- and polysaccharides

In nature, both furanoside (five-membered) and pyranoside (six-membered) ring forms are present as glycosides in a wide variety of larger glycan molecules, and both types are frequently observed in the same molecule. Currently, the only techniques to maintain ring structures after depolymerization involve permethylation (making methyl ethers at every free hydroxyl group), followed by reductive depolymerization—a very old technique (Gray 1990). More sensitive and rapid methods are needed to assign ring forms with confidence and to determine their locations in larger unknown glycan molecules.

5.2.1.3. Methods for determination of anomeric configuration

In larger molecules, individual sugars cyclize and generate an asymmetric center (the anomeric position). There is currently no hydrolytic/solvolytic method to generate monomers that maintains this asymmetry, and only three existing tools can even address this on intact oligosaccharide/glycoconjugate molecules. Nuclear magnetic resonance (NMR) is the most general approach, and various NMR experiments can be used to assign anomeric configuration generally, to discriminate between aldoses and ketoses, and for five- and six-membered sugar rings (Duus et al. 2000; Bendiak et al. 2002; Coxon 2009). A current drawback is sensitivity. Glycosidases can distinguish anomeric configuration much more sensitively through release of a sugar from a larger molecule having a sensitive fluorescent tag. Current drawbacks to their general use are (1) in many new systems being studied, many of the glycosidases are either not known or not commercially available; (2) contiguous regions of the same sugar having the same anomeric configuration, or branched structures decorated with the same sugar having the same anomeric configuration, can result in the release of several monosaccharides, whereby determination of the specific linkages between them is not possible to establish; (3) in many systems the absolute specificity of glycosidases for underlying structures is not known, so they may fail to act on their substrate, even though the anomeric configuration is correct; (4) in many systems the complete specificity for all isomeric sugar variants is unknown or has not been tested; and (5) glycosidases need to be pure. Any contamination with other glycosidases, even in small amounts, can lead to erroneous conclusions.

Mass spectrometry of medium-sized oligosaccharides has the potential for general determination of anomeric configuration, through higher stages of ion isolation/dissociation, whereby smaller fragments of the larger molecules can be compared to a much more limited number of possible isomeric variants, such as, for example, the glycosyl-glycolaldehydes (Fang and Bendiak 2007). Other potential fragments include small glycoside fragments derived from derivatized molecules, such as permethylated or other peralkylated derivatives (Mendonca et al. 2003; Zhang et al. 2005). Use of these or similar small fragments to differentiate anomeric configuration generally will require new developments in mass spectral capabilities and more widely available standards for spectral comparisons.

Perhaps more importantly, any other analytical techniques that could address the general issue of confident assignment of anomeric configurations of monosaccharides in larger glycan structures would be highly desirable. While improved existing technologies that can solve this problem are desirable, completely novel approaches also are needed, particularly at very high (i.e., single-cell) sensitivities.

5.2.1.4. Methods for assigning linkages between sugars

Currently, permethylation analysis (Hakomori 1964; Lindberg and Lonngren 1978), NMR through peracetylation with 13C labeled isotags (Bendiak et al. 2002), and mass spectrometry (Zaia 2004; Morelle and Michalski 2005; Park and Lebrilla 2005) can provide information about linkage sites between sugars in larger molecules, each with their own limitations. Permethylation analysis, which cleaves all glycosidic bonds after making methyl ethers at every free hydroxyl group, cannot determine which sugars are linked to which in a larger molecule, although it can provide much important information about stereochemistries, ring forms of sugars, and substitution positions of individual monomers. NMR can provide the greatest amount of information about linkages, particularly after derivatization with 13C-labeled derivatives at all hydroxyl groups or through direct 13C isotopic enrichment via three-bond 1H-13C couplings directly across glycosidic linkages. The main limitation in many systems is sensitivity. Mass spectrometry usually yields a great deal of linkage information based on multiple cleavage methods, either before or after derivatization, and is currently far more sensitive than NMR (Dell et al. 1994; Dell and Morris 2001; Jang-Lee et al. 2006). Detailed studies of dissociation of model glycan standards is needed, particularly with specific isotope labels (such as 2H, 13C, and/or 18O), to firmly establish and further understand dissociation pathways and mechanisms. Current limitations are ease of synthesis of many of the isotopically labeled standards. These techniques may not be directly applicable to very small (single-cell) quantities in the future, so additional methods of greater sensitivity that can address linkage positions between sugars in oligo- and polysaccharides would be important to develop.

5.2.1.5. D- versus L-enantiomers

In nature, both D- and L-enantiomers of sugars are found, and depending on the organism, either or both types can be present (Kremer and Gallo-Rodriguez 2004). Currently, the only methods that can address this involve depolymerization of larger oligosaccharides or polysaccharides to their monomers, followed by either derivatization (often with chiral reagents) and/or separations to compare the monosaccharides or their derivatives to known enantiomeric standards. It is not currently possible to prove the location of D- or L-sugars in a general way in an intact molecule without first depolymerizing the molecule to its monomers and establishing their enantiomers. New methods are needed.

5.2.1.6. Aldoses versus ketoses

Both types of these monomers are found in many living systems, and current methods to determine their structures and locations in a larger molecule involve either depolymerization or NMR, each having specific limitations. Confident assignment of these structural variants, particularly determining their specific locations in larger molecules, is needed at sensitive levels, and any novel procedures that could acquire this information would be useful and important.

It should be emphasized that, while the aforementioned tools are those currently available for primary structure determination, development of new structural techniques that could confidently determine glycan structures, particularly techniques that could provide much higher sensitivities, would be desirable. Long-term developments that might use completely novel approaches to primary structural determination would be of value.

5.2.1.7. Tools for determining three-dimensional structures of glycans and higher-order superstructures

Almost invariably, glycans and glycoconjugates reside in an aqueous environment in nature and exist in either a soluble state with individual sugars extending into the solvent or, particularly for polysaccharides, an insoluble state having unique interactions in a higher-order three-dimensional structure. For all glycan molecules, their overall structures ultimately determine their biological roles.

It is important to understand the conformations and three-dimensional solution structures of glycans. In their interactions with proteins, usually structures having from three to six monosaccharide units, referred to as “determinants,” define a three-dimensional structure that interacts with a unique protein pocket or binding site. Sometimes, multivalency, in either the proteins or multivalent presentation of determinants, plays an important role in the strength and specificity of their interactions. Understanding these interactions and how the glycan or protein conformation may be modified on binding is essential for a detailed understanding of their biological roles. Similarly, with larger cell wall polysaccharides or glycosaminoglycans, their three-dimensional structures or “superstructures” are often arranged in unique ways, a function of the primary structures themselves. These three-dimensional arrangements render physical properties that are essential for organisms to function. For instance, cartilages come in several variants that differ in their glycosaminoglycan types and relative contents. Hyaline cartilage is essential in part for its compression characteristics and is found on the articulating ends of long bones. Fibrocartilage, primarily in ligaments, is very resistant to shear forces. Chitin is an abundant polysaccharide comprised of N-acetylglucosamine and is used to provide structural strength in many invertebrates. Cellulose is especially suited to provide structural strength, with some flexibility, for plants. Bacteria and many colonial marine algae contain glycans of a widely differing nature that play important roles in their survival. Some higher-order structures provide unique physical states ranging from gel-like to resin-like to stiff, cross-linked, and yet hydrated solids. Determining their three-dimensional structures is essential in understanding both their physical properties and their biological functions.

The two major tools for atomic-level characterization of the structures of glycans are NMR spectroscopy and crystallography. Several other types of analyses are useful in studying their physical properties, such as rheology, and other instruments that study physical/mechanical properties of the structures such as their shear or compression. In addition, several other types of spectroscopies can be used to evaluate differences in physical properties as having changed in different bulk states of polysaccharide polymers, although not usually providing information about detailed atomic-level structures of those states.

NMR spectroscopy can provide detailed three-dimensional information about oligosaccharides and polysaccharides at the atomic level. While some structural information can be obtained through solution NMR using naturally abundant magnetically susceptible nuclear isotopes (nearly 100 percent for 1H and about 1 percent for 13C), much more information can be obtained with isotopic enrichment (Bose-Basu et al. 2000; Martin-Pastor et al. 2003; Yu and Prestegard 2006), which provides a number of important advantages for acquiring information about solution structures (conformations) and dynamics (internal motions) of the molecules. What is currently difficult to do is easily introduce isotopes of interest at any desired location in any glycan molecule. In most cases this is a “synthetic project.” Hence, methods for rapid introduction of isotopes selectively at any location in glycan molecules would be of great importance. This involves not only enrichment with 13C but also in amino sugars with 15N, and at any position the introduction of a deuterium atom, 2H, would be of considerable value in studies of molecular dynamics. Simple isotopic enrichment of specific desired atoms in an oligosaccharide is currently the major barrier in NMR determination of larger glycan molecules. If this impediment were lifted, a great deal more information about glycan three-dimensional structures could be achieved rapidly. Advances in NMR techniques themselves, both for liquid (dissolved) and solid samples, are needed.

Crystallographic techniques are also important to provide three-dimensional structural and “superstructural” information about glycans. Many small sugar structures have been crystallized, frequently in the presence of ions (Jeffrey and Sundaralingam 1985). Crystallography has already provided valuable information about their structures that NMR cannot ascertain. For example, crystallographic studies often indicate the participation of water molecules in hydrogen-bonded networks not observed by NMR. Also, crystallography has provided detailed information about bond lengths and bond angles, which can differ slightly among six-membered sugar rings depending on their stereochemistry. Crystallography has also provided evidence for interesting intermolecular interactions, often through hydrogen-bonded networks, for higher-order structures, including helical structures for repeating polysaccharides and interactions between helices. Without doubt, development of crystallographic methods tailored to analyses of complex glycans will be important, and future advances in crystallographic techniques will provide valuable information about three-dimensional structure.

“Superstructural” information about polysaccharides in large structural complexes and composites will be important to advance. Additional techniques, spectral or otherwise, can provide information about higher-order structures, although sometimes not to atomic-level structures. Nonetheless, all such techniques and the development of new techniques are useful for the field. For example, for plant cell wall characterization, advances in dual-axis electron tomography are needed to investigate both the three-dimensional organization of cellulose microfibrils in the cell wall and the configuration and linkages between the different cell wall components. For cellulose nanomaterials, three-dimensional characterization is needed to better understand their structures because there are several factors that may cause deviations from the idealized crystalline structures: statistical variety of the crystallite formation during biosynthesis, effects of the extraction process, a large surface area–to–volume ratio, and the coexistence of crystalline polymorphs and amorphous cellulose. Understanding how all of these variations alter the mechanical properties of cellulosics will be important in understanding how to break them down effectively as a usable energy source. Percent crystallinity and cellulose polymorphs are typically measured with x-ray diffraction, Raman spectroscopy, and solid-phase NMR spectroscopy; however, the calculated percent (crystallinity or polymorph fraction) from each technique is different, a result based on the measurement method used and the assumptions used to calculate crystallinity. Used in combination with atomistic modeling, additional insight can be gained by showing that the hydrogen-bonding configuration is different for surface chains than interior chains, for instance.

Advances in techniques that provide information about higher-order organization of a large number of polysaccharides will be of importance in a wide variety of industrial, medical, materials science, and energy-sector applications. A great deal is currently unknown and remains to be learned about glycan structures and their molecular interactions with proteins and other molecules. Therefore, advances in these techniques toward higher sensitivities and faster and more confident assessments of their three-dimensional structures are needed.

5.2.2. Analysis of Glycoconjugates

Glycans are often conjugated to other molecules, and the most abundant and widespread of these are glycoproteins and glycolipids. Glycoproteins can be broken down readily to glycopeptides using peptidases. This increases the complexity of analyses because a single glycan can now be associated with a number of peptides, or a number of glycan structures (including isomeric species) can be associated with each peptide. This further complicates the separation problem because more individual molecular species are present and each is present in less abundance. To date, analysis has rarely been performed for a single glycoprotein for which all glycan structures associated at different glycan-peptide linkage positions have been determined unambiguously. Having detailed structural information on glycoproteins is an important part of studies to address their functions in vivo. Similarly, glycolipids have glycan structures linked to lipid moieties at their reducing end glycosides, and these lipid groups can be variable in structure. Both the nature of the glycan structures and the nature of the variations in these lipid groups are likely to be essential for understanding glycolipid functions. To routinely achieve a level of rapid and sophisticated separations and structural analysis with many glycoproteins and glycolipids will require improved technologies.

In addition to primary structural analysis, three-dimensional structural analysis of molecules such as glycoproteins is used. X-ray crystallography is one method that has been widely used to determine the structures of proteins. Some glycoproteins, depending on the nature of crystal contacts, have been crystallized, with the caveat that their glycan structures (or, sometimes, lack of observable structures) might result from their packing into unusual (or multiple) conformers compatible with crystal formation. Thus, electron density that is either observed or not observed for the glycan portions of glycoproteins can be prone to interpretive errors as compared to their true solution structures.

NMR spectroscopy has the potential to determine the solution-state glycan conformations of glycoproteins under physiological conditions. However, this requires isotopic labeling (13C, 15N) of the sugar component and currently carries the caveat that glycoproteins are less than ∼50,000 molecular weight. As mentioned earlier with three-dimensional structures of the oligosaccharides themselves, similar techniques may be applied to isotopically labeled structures attached to proteins, and isotopic labeling of the protein portion is also desirable, either together, or independently, from labeling the glycan moiety. Currently, there are two major hurdles to performing these studies: (1) isotopic labeling of the glycan structures is far from straightforward and (2) isolating a glycoprotein structure having single glycan moieties (sometimes at multiple sites) is currently difficult to achieve. In the long term these developments will be essential. The current approach has largely been to ignore the glycans either through expression of the protein in prokaryotic systems or in some cases through site-specific deletion of the glycan–amino acid linkage site. However, this approach will be met in the future with the inevitable question of whether this protein structure is correct or simply convenient to do, in which case all glycosylated proteins or proteins having other posttranslational modifications may need to be solved again to yield the correct structures. Because even a simple phosphorylation can have dramatic effects on a protein's activity or function, glycosylation would be expected to have equal or even more dramatic effects.

How might specific glycan structures be generated at specific protein sites? This is a current challenge to the scientific community but will be important in the future in understanding the specific roles played by unique glycans linked at unique protein sites.

In addition to the many high-resolution tools and techniques described above, valuable information about glycan structure can be obtained through the use of molecules that bind glycans with known specificities. These include glycan-binding proteins, lectins, and antibodies. The structural information such methods provide is of lower resolution than techniques such as NMR, but one advantage of these molecules is that they permit interrogation of intact glycoconjugates and even of intact cells and tissues. Such approaches are complementary to those involving glycan release and chemical and physical analysis. Having additional tools such as glycan-binding molecules with fully characterized binding specificities can also be a powerful approach to better understanding glycan structure function.

5.2.3. Analysis of Glycan-Protein Interactions

5.2.3.1. Glycan-protein interactions—the search for endogenous cognate ligands

For many glycoproteins the glycans contain crucial biological information, and that information is decoded, often at the cell surface, by glycan binding proteins. A very important goal is to develop methods that can assess the glycan binding specificities of various proteins and that can rapidly isolate the glycan component(s) having the highest binding affinities.

Frequently, the binding pockets in proteins have a specificity for glycan structures from about four to six sugars (their determinant), but multivalency of binding can mean that more than one of these determinants must be displayed on a branched glycan molecule for highest binding affinity. There are two general approaches: (1) Separate a natural series of oligosaccharides isolated from some source into a number of fractions, many of which may not be pure. Immobilize the fractions (i.e., at their reducing ends) in an array. Then test for binding of a protein to all of the immobilized fractions in the array. Then, for positive fractions, further fractionate the sample until individual natural glycan molecules can be identified that interact with a specific glycan-binding protein. A variant of approaches also can be used via immobilization of the glycan-binding protein on a column, with passage of glycans over the column, followed by elution with a hapten (usually simple sugars like methyl glycosides), followed by structural determination of the glycans that preferentially bind to the immobilized protein. (2) A second approach is to synthesize a number of glycan substructural determinants and immobilize the synthetic molecules on a binding array surface. This approach has the advantage of knowing the precise structure at each array position. However, it has the disadvantage that a natural structure might have a much higher affinity for the protein, either in having a previously unknown structure with higher affinity than a synthetic array or a multivalency of determinants on a single glycan molecule could result in higher affinity. Both approaches are valid, though, for making the biological connection between specific glycans and the proteins that bind them.

5.2.3.2. Detailed studies of atomic-level glycan-protein interactions

Current techniques to evaluate atomic-level protein-ligand interactions are primarily crystallographic or involve NMR. But difficulties can ensue using either technique. Frequently, the most time-consuming aspects are the availability and/or preparation of the glycan structures involved in the interactions, as well as variants of those structures to assess binding efficacy and multivalency. In the case of NMR, preparation of specific isotopically labeled glycans is frequently rate limiting. Either technique, depending on the specific protein and glycan structure interacting, may not yield information: This may be due to an inability to obtain crystal structures or to ones that fail to diffract to yield atomic-level assignment or, in the case of NMR, the size or overall flexibility of the protein-glycan complex. This information is vitally important for a detailed understanding of protein-glycan interactions. Further development of these or any other techniques that can provide information about such binding interactions will be of value.

5.2.4. Analysis of the Roles of Some Glycans in Metabolic Pathways Related to Energy Metabolism

A vital role of glycans centers on the involvement of glucose in the glycolytic pathway, in utilization of its carbons in the Kreb's citric acid cycle, and in subsequent production of adenosine triphosphate, the currency of cellular energy, through oxidative phosphorylation. Likewise, storage of glucose as a retrievable polymeric energy source as glycogen or (in plants) starch forms the basis of the energy humans use to survive on a daily basis.

Each tissue has energy needs that differ, and in many organisms alternate sugar sources to glucose are possible to use through metabolic conversion. Much is still unknown about the use of alternative energy sources by many microorganisms, yet understanding these metabolic pathways may provide important keys to biomass conversion. A great many potential alternatives to the use of cellulose as an energy source may be feasible through future studies of additional bacterial or algal metabolic interconversion systems, and their efficiencies, rates of carbon dioxide fixation, and potential for large-scale conversion to biofuels need to be carefully examined. Each system will require analytical capabilities for studies of accumulated glycan structures and/or metabolic intermediates en route to potential molecules usable as biofuels.

In humans a number of metabolic diseases form the basis of a relatively recent area of science—metabolomics (Kaddurah-Daouk et al. 2008). Development of analytical tools to understand the balance of metabolic intermediates in healthy individuals, deficiencies in patients, and whole-body imaging of some metabolites (see below) are important needs and challenges related to our use of sugars as energy sources. A number of glycosylation deficiency disorders (>65) have been described elsewhere (Freeze and Sharma 2010) that also affect synthesis and degradation of glycoprotein, glycolipid, and glycosaminoglycan glycan structures, and techniques to diagnose and further detail metabolic intermediates (partial glycan structures) in these disorders form a starting point for a deeper understanding of the roles of glycans in a number of disease states.

5.2.5. Analysis Techniques That Relate Glycan Structures and Their Synthetic Enzymes

Ultimately, glycan structures in organisms are related to expression of the enzymes that synthesize and degrade them—glycosyltransferases, glycosidases, and other proteins such as nucleotide-sugar transporters, sulfotransferases, and a number of other proteins involved in their compartmentalization in cells. These are encoded by genes that are differentially expressed in many cells, and a detailed understanding of this expression as well as mechanisms that control their breakdown will be valuable in understanding the biological functions of the glycans. Transgenic knockouts are one important tool to investigate the functions of these genes and have already proven valuable in determining the requirements of a number of glycosylation enzymes in developmental processes and the roles they play in some human diseases (Hennet and Ellies 1999; Haltiwanger and Lowe 2004).

While these genetic ablations have determined the crucial roles of glycans through dramatic phenotypic abnormalities, development of tools for a more detailed understanding of their functions in vivo is important. Tools for organ-specific, cell-lineage specific, and cell–type–specific ablation of glycosyltransferases and glycosidases will be needed in order to examine in more detail the specific effects they have in the functioning of specific differentiated cells. Furthermore, the informational role of glycans in intercellular recognition events will require an understanding of (1) the location of expression of specific glycan structures to the level of individual cell types and (2) the location of expression of the proteins that bind to them (endogenous mammalian lectins). Imaging techniques (see below) will be important for establishing localization of both binding partners between cells where suspected interactions occur. Detailed in vivo analysis of these interactions will require a number of technical developments, in both cell-specific knockout techniques and glycan imaging techniques.

5.2.6. Analysis of Locations of Specific Glycan Structures in Organisms Through Various Imaging Techniques

In studies of RNA expression, in situ hybridization has proven invaluable in localizing specific RNA transcripts in cells and in identifying their expression in tissue patterns in organisms. Similarly, in protein expression, detailed localization of proteins has been made possible through carefully studied monoclonal antibodies having low cross reactivity and high specificity for individual protein molecules or through expression of chimeric proteins tagged with fluorescent proteins. These tools alone have been extremely important in characterizing cellular and subcellular locations of these molecules and in providing essential insights into their functions in higher organisms.

Monoclonal antibodies unique to specific glycan determinants are well known, but arrays of many more that are specific to unique glycan structures are needed as biological probes. A large group of monoclonal antibodies specific for plant glycan determinants are being used for localization and glycome profiling (Avci et al. 2011). Potentially, a large array may be developed through syntheses of a number of glycan determinants and selection of good monoclonals having high specificity and binding affinity to unique glycan structures, including those of glycoproteins, glycosaminoglycans, and glycolipids, in addition to glycan substructures unique to pathogenic organisms that might be valuable for their diagnosis or therapy. Antiglycan antibodies can include both IgM and IgG classes, although there is a general tendency for IgM to bind glycans over IgG, a factor to be considered in array development.

More recently, RNA and DNA molecules (aptamers) have been shown to be selectable for unique binding properties to many molecules, and they also have the potential to provide important selective binding probes for many unique glycan structures. It seems feasible that aptamers might provide another battery of arrayed probes useful for imaging glycans that have been explored little as yet, but these and any other methods for uniquely identifying single glycan species at specific cellular locations will be of great value.

Recent developments in other imaging techniques have occurred as well. These include mass spectral imaging of peptides and lipids in tissue slices and could conceivably be extended to a number of glycosylated peptides and glycolipids assessed through localization of selected precursor ion masses and unique glycan neutral losses. Mass spectrometry imaging techniques may enable analysis of a broader range of glycan structures simultaneously in tissue sections, and further developments in mass spectral imaging techniques as applied to glycopeptides or glycolipids are needed. While this comes with the caveat that more than one glycan isomer could contribute to a select glycopeptide or glycolipid precursor ion mass, it would still be of considerable value because many precursor masses could be independently studied, some attributable to unique peptide-glycan or lipid-glycan combinations.

Whole-body imaging techniques also might hold enormous promise in nondestructive molecular imaging in living animals, including humans. Magnetic resonance imaging with pulse sequences uniquely tailored to imaging specific metabolites or unique individual molecules such as ATP and specific oligosaccharides could be feasible in the future, studied initially through isotopically enriched molecular species in animal studies. The use of 2-18F-2-deoxyglucose is already being routinely used in positron emission tomography to study glucose uptake in various tissues under different metabolic states and for investigation of different neurological conditions (Zijlstra et al. 2006). These and other whole-body imaging techniques need to be developed initially in animal studies, followed by potential applications in human diagnostics. There is great potential for imaging techniques in nondestructively determining the locations of glycan molecules in whole-body imaging protocols. Similar approaches can be used for in vitro analyses.

In summary, any novel imaging techniques that could address histological specimens, tissue slices, or whole-body living organisms and provide information about the locations and/or superstructural organization of specific glycan molecules will be of value in the future.

5.2.7. Key Messages on Glycan Analysis

Although analysis of glycans is challenging because of the number and diversity of glycan structures, a wide range of helpful tools exist. Perhaps not surprisingly, these techniques have different advantages and limitations, and no single analytical technique will be able to provide all desired information. The most suitable technique will depend on the nature of the question at hand and the essential information required. The different types of techniques also provide information at different levels of analytical specificity; as discussed further in Section 5.6, specification of the level of analysis as part of bioinformatics and other efforts is necessary. Many techniques are currently available for polymer and nanoparticle synthesis, including rheology and light-scattering techniques, and are currently used heavily in many fields, including materials engineering. There is a large array of possible data that can be obtained with current analysis tools, depending on the necessary level of analysis. However, further developments in such areas as separation techniques, structural analysis, and the use of glycan-binding proteins and antibodies of known binding specificities may all expand the analytical toolkit to advance glycoscience knowledge.

5.3. COMPUTATIONAL MODELING

Computational modeling in biology has been an active area of research for many years, particularly in terms of protein structure modeling, molecular dynamics simulations, protein docking, and virtual drug screening. However, for any of these methods to be deemed useful, modeling must be able to accurately predict structures that are experimentally verifiable. Developing such models for glycan structures is particularly challenging but is also an important component in the glycoscience toolkit.

5.3.1. Computational Modeling of Oligo- and Polysaccharides

SWEET-II is one of the first Web-based tools for computational modeling of glycan structures. It is provided as a part of the glycosciences.de website, where a number of tools for analyzing glycans in three-dimensional space are provided. Molecular dynamics simulations of oligosaccharides in explicit solvent have also been performed that provide an analysis of the trajectories of a complex oligosaccharide or glycoprotein structures, and quantum mechanics, or ab initio, and quantum electrodynamics methods are also used for modeling of glycan conformation. However, these methods are computationally demanding and cannot be used routinely to study or predict the three-dimensional structure of complex glycans. The development of computationally efficient and accurate simulations of complex glycans is therefore a major challenge that needs to be addressed.

A number of glycan force fields have been developed, including GLYCAM, AMBER, CHARMM, OPLS-AA, GROMOS, MM4, and SPASIBA (Hancock et al. 2006; Fadda and Woods 2010). These force fields vary according to which atomic interactions are modeled, the mathematical form of those interactions, and the fit of the parameters to the resulting mathematical expressions. In addition, the details of a given force field may vary significantly from one version or release to the next, and the force fields are often subject to simplifications or approximations to improve computational efficiency. As a result, each force field provides a unique parameter set resulting from the various techniques invoked to improve performance or to ease implementation, transferability, or generality to expand beyond glycans. Different protocols also have different levels of compatibility with other biomolecules and solvent models, indicating the difficulty in reenacting real-world situations computationally.

For any modeling software or algorithm to be useful, it must be able to accurately predict known structures. The entropic components of solvent order and disorder may be extremely difficult to estimate now, but this is a necessary challenge to be resolved in glycan modeling. Moreover, another challenge lies in modeling even small molecules, such as simple methyl glycosides, that can serve as gold standards for modeling measurable spectral parameters. Although computational performance will likely improve with technological progress, new algorithms and software for such modeling challenges are greatly needed.

5.3.2. Protein-Glycan Interactions

One of the major challenges in molecular modeling is the development of efficient and accurate methods to estimate the binding affinity between proteins and glycans. Many protein-docking methods have been applied to study protein-glycan interactions. However, there are factors involved in these interactions, such as bridging water molecules and CH-pi interactions, that are often ignored in these simulations for a variety of reasons. Further research in this area is of great interest and would aid in our understanding of glycan function (Frank and Schloissnig 2010). Another difficulty with predicting these interactions involves the fact that binding affinity may not always necessarily increase with larger oligosaccharide size. Moreover, glycan-protein interactions are intrinsically more dynamic than other protein-ligand interactions because their affinity for one another arises from several relatively weak interactions. As a result, accurate prediction of binding affinity is a challenging task that remains to be solved.

5.3.3. Atomistic Modeling of Crystalline Cellulose

Since the 1980s, atomic-scale modeling of cellulose has been used to complement experimental measurements of individual cellulose crystals. Such studies have advanced our understanding of cellulose structure, energetics, mechanical characteristics, and interfacial properties involving liquids, a given chemical species, cellulose chains or surfaces, and enzymes. Atomistic modeling also provides a fundamental understanding of the atomic-scale origins of these characteristics. However, the predictions made by these models are limited, because they are highly dependent on the accuracy of the force fields that describe atom-atom interactions and on the accuracy of the structural information at the atomic level of cellulose nanomaterials. To improve understanding of the construction and deconstruction of cellulose for biofuels, and to better tailor the properties of cellulose nanomaterials extracted from biomass, three key challenges need to be addressed:

  • development of more accurate force fields for cellulose with experimental validation of input parameters;
  • improved structural-property characterization and linkages; and
  • improved interaction simulations involving glycans with liquids, a given chemical species, cellulose chains or surfaces, and enzymes.

5.3.3.1. Force fields for cellulose

For cellulose a force field must accurately describe the stretching, bending, and torsion of covalent bonds, electrostatic interactions, van der Waals forces, and hydrogen bonding. The force fields most commonly used for cellulose modeling are MM2/MM3, GROMOS, CHARMM, CVFF/PCFF/COMPASS, AMBER, Dreiding, and COSMOS. Among the numerous differences between force fields, one of the most important for cellulose is hydrogen bonding, because it is critical for simulating structural and mechanical properties, as well as the interaction of cellulose nanomaterials with the environment. Both implicit and explicit hydrogen bond models have been developed using a variety of combinations of parameters to describe force fields. However, the difficulty in utilizing this information is the lack of a consistent definition of a hydrogen bond, which has typically been identified by the distance between hydrogen and acceptor and sometimes by the angle between donor, hydrogen, and acceptor. Although the hydrogen bond model selected can have a significant influence on simulation predictions, the advantages of using one approach over another have not yet been conclusively determined.

5.3.3.2. Predicting cellulose nanomaterial properties

Molecular modeling has been used to predict a variety of cellulose material properties, including elastic modulus, thermal expansion, and Poisson's ratio. The most frequently predicted properties are elastic properties because they can be calculated using molecular models relatively easily and are experimentally measurable. Molecular simulation has shown that numerical removal of hydrogen bonds can cause predicted elastic properties to decrease on the order of 50 to 60 percent. In addition, molecular modeling has shown that cooperative hydrogen bonding plays such a critical role in the behavior of cellulose that omitting interchain hydrogen bonding, as is necessarily the case for a single cellulose chain, will affect intrachain hydrogen bonding.

5.3.3.3. Predicting interface properties of cellulose

Models have been used to investigate the interaction of cellulose and other glycans with other materials, primarily liquid solvents and other polymeric materials. Most solvent studies are focused on water, although some have modeled the behavior of cellulose in benzene and cyclohexane. Some studies have used the radial distribution function to characterize water structure relative to specific surfaces, referred to as solvent-accessible surfaces. Instead of surrounding cellulose with water, some studies have introduced a water droplet into the model to investigate solvent-accessible surfaces in terms of the contact angle. Modeling studies have even shown that water induces changes in the crystal structure itself in ways that include affecting the twist of the cellulose strand. New modeling techniques and structure-property linkages will provide additional insight into how to dissemble cellulose materials and on the interaction of cellulose nanoparticles and their environment, which will be useful for suspension-based composite-processing routes.

To date, the primary focus of modeling studies of the interaction of cellulose with other polymeric materials has been on characterizing molecular interactions of the various surfaces of crystalline cellulose with the desired polymer. The noncovalent interactions between cellulose and an adjacent molecule are characterized in terms of interaction energies, density profiles, and orientation changes. The interaction between a cellulose crystal and a single cellulose chain has also been characterized by numerically pulling the chain away from the surface and calculating the pull-off force as a function of the initial chain orientation. In addition, the potential for chemical grafting as a means of strengthening the interaction between cellulose and a polymer matrix has been investigated. Further advances in this area of modeling, combined with the development of new structure-property linkages, will provide additional insight that will aid in the development of cellulose nanomaterial composites, particularly in how to tailor the interface chemistries to provide the desired properties in composite structures.

5.3.4. Key Messages on Computational Analysis of Glycans

Computational modeling of glycans and the interactions of glycans with each other and with other molecules is often a complementary tool to other analytical techniques for understanding glycan structures and properties. Although significant advances have been made in the development of computer models and force fields for glycans, accurate predictions remain challenging. This is due to such factors as the flexibility of glycan molecules, the fact that many glycan interactions involve multivalent and weak molecular interactions, the complex role of water in determining glycan three-dimensional structures, and the need for models to accommodate electrical charges on certain subsets of polysaccharides.

5.4. GLYCOENZYMES

The glycome of an organism is defined by glycoenzymes, encoded by the genome, that synthesize and degrade glycans. The high specificity and efficiency of these enzymes can be exploited as tools to produce or degrade glycoconjugates of interest and to manipulate glycans in complex biological systems to study their functions (Kiessling and Splain 2010). These enzymes can also serve as targets for inhibition to alter glycan structures in situ, with potential for applications that benefit human health.

5.4.1. Classes of Glycoenzymes

Numerous enzymes are needed to synthesize and degrade the glycans of animals, plants, and microorganisms. Coordinated expression of these enzymes is required for normal production and degradation of glycans in any cell, and deficiencies or mutations in any can result in abnormal biology and disease. For the biosynthesis of glycans, glycosyltransferases are responsible for building glycan chains of defined structure needed by a cell. The resulting glycans can be modified by other classes of enzymes, such as sulfotransferases, O-acetyltransferases, and epimerases, resulting in additional structural diversity that influences the functions of the glycans. Degradation of glycans is carried out by glycosidases, which all organisms rely on for digestion of glycans as nutrient sources. Glycosidases are also required for the normal turnover of glycans produced by cells, and many human deficiencies in these enzymes result in the buildup of products. Genetic deficiencies in many of these enzymes are the basis of many severe human disorders recognized as childhood disorders of glycosylation.

Despite the importance of glycoenzymes, little is known about how they carry out the biosynthesis and degradation of glycans of animals, plants, and microorganisms. Of high current interest is understanding how glycosyltransferases are able to produce glycans that contain information needed to mediate diverse biological processes.

5.4.1.1. Glycosyltransferases

Glycosyltransferases carry out the nontemplate-driven synthesis of glycans in all organisms and in the process transfer information encoded by the genome to the glycan structures that comprise the glycome of that organism. Most glycosyltransferases act by transferring a single sugar to a specific hydroxyl group of another saccharide in a growing glycan chain. In this case the fidelity of glycan structures is determined by the high specificity of glycosyltransferases for their substrates, with the product of one enzyme being recognized as the acceptor substrate of the next enzyme, allowing the assembly of glycans of defined structure. It is estimated that the human genome encodes approximately 250 glycosyltransferases (Narimatsu 2006; Henrissat et al. 2009). Some glycosyltransferases, such as those that synthesize core structures, are expressed in nearly every cell, whereas the subset of glycosyltranferases that elaborate terminal sequences on glycoprotein and glycolipids glycans are differentially expressed (Lowe and Marth 2003; Comelli et al. 2006).

Although most glycosyltransferases catalyze the formation of one type of linkage, some promote the assembly of polysaccharides. Mammalian polysaccharides include medically important substances such as heparin or hyaluronan, which are composed of multiple linkages. Plant polysaccharides, some of which are structurally very complex, are the products of plant glycosyl transferases, and these polysaccharides form the majority of plant biomass. Bacterial polysaccharides are often essential to bacterial viability, and they also serve as vaccine candidates. For polymerizing glycosyltransferases the factors that determine polysaccharide length, sequence, and fidelity are much less well known. What is apparent is that each cell type produces a characteristic set of glycan structures encoding information that contributes to the biology of that cell (Lowe and Marth 2003; Comelli et al. 2006). While the precise organization of the glycosylation machinery differs somewhat for plants, bacteria, and other microorganisms, the central role of glycosyltransferases in mediating nontemplate-driven biosynthesis of defined glycan structures is the same.

5.4.2. Applications of Glycosyltransferases and Other Glycoenzymes

As a result of their exquisite specificity, glycosyltransferases and glycosidases are widely recognized as important tools for chemoenzymatic synthesis of glycans and as enzymatic probes of glycan structure. In this context they can be likened to other enzyme classes that have enabled the analysis and synthesis of other biopolymers, such as polymerases and endorestriction nucleases for DNA, and proteases for proteins. However, major barriers to the routine use of these enzymes as tools include lack of widespread availability, lack of well-characterized enzymes of desired specificity, lack of access to the appropriate nucleotide sugars, and difficulty in producing stable enzymes in sufficient quantity. As a result, only a few laboratories have the capacity to exploit the power of glycosyltransferases as synthetic tools despite widespread interest in using them. Having a diverse enzymatic toolbox that is widely accessible to the glycoscience community would dramatically accelerate their use and solidify their roles in chemoenzymatic synthesis of glycans and as probes for analysis of glycan structure.

To exploit and understand glycosyltransferases, insights into the generation of sugar-nucleotide donors also are needed. Advances on this front are enhancing access to a wide range of glycosyl donors. Specifically, the engineering of glycosyltransferases such that they run in reverse to catalyze nucleotide-sugar formation is enabling the production of natural and nonnatural building blocks (Gantt et al. 2011a). Similar methods to rapidly generate lipid-linked sugar donors, which are used by many microbial enzymes, could lead to advances in our understanding of host-pathogen interactions and fuel efforts to devise novel antimicrobial agents. Moreover, inhibitors that block the production of unique nucleotide sugars in pathogens could lead to new classes of antimicrobial agents. For example, compounds of this type could lead to new strategies to combat infectious diseases, including tuberculosis (Dykhuizen et al. 2008).

In the same way, inhibitors of glycosyltransferases are needed to investigate their roles and the glycans they produce in biology. Currently, there are only a few glycosyltransferases for which enzyme inhibitors have been identified, and there are inhibitors of only one or two that have in vivo activity. As a result, the field has adopted the much more difficult strategy of using genetic knockouts in mice or Arabidopsis to gain information on the roles of individual enzymes and the structures they produce in vivo. The phenotypes of glycosyltransferase knockout mice strongly suggest that inhibitors of selected enzymes would have therapeutic potential, while Arabidopsis knockouts are helping to identify targets for overcoming recalcitrance in plant biomass. The availability of inhibitors for key glycosyltransferases would be of enormous benefit in elucidating the functions of their glycan products and would validate these enzymes as targets for therapeutic intervention in human disease. It is envisioned that inhibitors of classes of enzymes as well as inhibitors of single specific enzymes will be useful tools to understand the biology of glycans and to provide altered glycans for discovery or therapeutic use.

To address these critical needs, a systematic effort is needed to identify enzymes for glycan assembly and degradation that exhibit the range of specificities and physical properties, such as stability and turnover rate, needed to build an enzymatic toolbox. A parallel effort is needed to identify inhibitors of glycan assembly. The sections below expand on these and other needs for glycosyltransferases and glycan-processing enzymes.

5.4.2.1. Chemo-enzymatic synthesis of glycans

There are two general strategies for chemoenzymatic synthesis of glycans: one is to use engineered glycosidases, and the other is to use glycosyltransferases (see also Section 5.1.2.1, which discusses enzymatic synthesis of glycans). The latter is the most widely used method today. The specificity of glycosyltransferases makes them unique synthetic catalysts for production of glycans of defined structure that will serve as reagents to investigate the biological roles of glycans and glycan-binding proteins (Vasiliu et al. 2006; Boltje et al. 2009; Palcic 2011; Xu et al. 2011). As synthetic tools, glycosyltransferases complement chemical synthesis technologies, and the combination of chemical and enzymatic synthesis can produce most natural glycan structures. In general, because of the specificity and regioselectivity of glycosyltransferases, addition of a single monosaccharide by a glycosyltransferase replaces 10 to 12 chemical steps to achieve the same end, which vastly simplifies the synthesis of complex glycans.

Although they exhibit high specificity for natural glycans, many glycosyltransferases exhibit high promiscuity for donor and acceptor substrates with unnatural substituents at positions that do not interfere with the enzymatic activity (Yu et al. 2006; Blixt et al. 2008; Palcic 2011). This allows them to be used for synthesis of glycans with substituents that improve biological activity or contain functional groups for subsequent chemical tagging. This biosynthetic promiscuity can be exploited to introduce functionality into the glycans of living cells by feeding them with chemically modified monosaccharides that are used by the cell's biosynthetic machinery in place of the natural sugar.

The major limitation to widespread use of enzymatic synthesis is the paucity of glycosyltransferases available to the research and industrial communities. In part the shortage of suitable enzymes results from the difficulty of producing mammalian glycosyltransferases in quantities sufficient to meet demand. The most useful and robust synthetic enzymes are bacterial glycosyltransferases that synthesize mammalian-type structures (Gilbert et al. 1998; Yu et al. 2006; Sauerzapfe et al. 2009; Palcic 2011). Identification of additional bacterial enzymes that fill the gaps in the enzymatic armamentarium would be extraordinarily useful. Genomic and metagenomic sequencing of all organisms, to date, has revealed more than 50,000 putative glycosyltransferases, of which only several hundred have been tested and confirmed to be glycosyltransferases (Narimatsu 2006; Henrissat et al. 2009; Palcic 2011). This genetic resource might be tapped to identify enzymes that can be produced to meet the need for these tools.

An alternative to the use of glycosyltransferases for synthesis of glycoconjugates is the application of engineered glycosidases, termed “glycosynthases” (Hancock et al. 2006; Wang and Lomino 2012). Recently, enzymes of this class have been shown to be especially effective for the en block transfer of oligosaccharides onto proteins bearing a single glycan moiety. These methods, as well as other advances in protein engineering such as expressed protein ligation, are being used to generate new types of defined glycoproteins.

Although most enzymatic synthesis has been done on a research scale, manufacturing scale is limited only by the lack of current technologies to produce enzymes and substrates cheaply. Engineering bacteria to produce small mammalian oligosaccharides has shown some promise as one approach to synthesis at large scale (Antoine et al. 2003, 2005; Drouillard et al. 2006), but the availability of glycosyltransferases produced by bacteria, yeast, or fungi also would address the scale problem.

5.4.2.2. Glycosidases and glycosyltransferases in glycan structure determination

Although methods for rapid profiling of glycans have advanced the glycoscience field, robust methods for the complete description of glycan structures are lacking. Methods in wide use include mass spectrometry approaches, such as nano-LC/MS and MS/MS, NMR, and x-ray crystallography, and conventional biochemical methods involving radiolabeling, glycosidase digestion, and chromatographic fractionation (Marino et al. 2010). None of these methods provides complete structure information. The lack of a single high-throughput method is made more challenging by the small amounts of glycans that can be obtained from analysis from biological sources. Indeed, as Marino et al. have noted, “No universal method for the rapid and reliable identification of glycan structure is currently available; hence, research goals must dictate the best method or combination of methods” (2010, p. 713).

Glycosidases have long been used to aid in the sequencing of glycans, providing key information for complete structure determination. However, their use is not routine and has not been exploited for high-throughput structure determination. In part this is due to routine availability and gaps in the enzymatic toolbox to assist in structure determination of diverse glycan structures. Because glycosidases are found in every organism and are critically important for degrading glycans for nutrients, there is an enormous genetic resource for enzymes that will fill the toolbox. This is exemplified by the existing CAZY database, which has identified glycosidase-related genes in more than 2,000 eukaryotic and prokaryotic species (www.cazy.org).

In principle, the high-substrate specificity of glycosyltransferases also offers the potential for a systematic approach to sequencing glycans. By analogy with the Sanger methods of DNA sequencing, where template-directed, chain elongation/termination reveals DNA sequence, a repertoire of glycosyltransferases could be applied to a glycan moiety of unknown structure, together with nucleotide sugars (radiolabeled or labeled with other report groups) in vitro. As applied to glycan sequencing, the “template” is the structure of the acceptor glycan whose structure is sought, but this template is three-dimensional instead of the two dimensions characteristic of a DNA sequence. Defining the structure of the more information-rich three-dimensional glycan template could be enabled with a repertoire of glycosyltransferases capable of “reading” the template. Some glycosyltransferases have strict acceptor specificity for disaccharide or trisaccharide sequences enabling them to be used to advantage over glycosidases to gain linkage information, a concept that is also being exploited to detect glycan epitopes of defined sequences on cell surfaces. Mutant glycosyltransferases engineered for informative acceptor and/or nucleotide sugar substrate specificity could make this approach more penetrating and more widely useful (Palcic 2011).

Such an approach could also be useful in expanding the utility of NMR-based glycan structural determinations, by analogy to approaches involving segmental isotope labeling of proteins that are expanding the size limit of NMR spectroscopy to larger proteins (Skrisovska et al. 2010). This technique could be applied to large glycan structures either as isolated moieties or as components of a glycoprotein or glycolipid, using one or more glycosyltransferases with defined substrate specificity and sugar nucleotide substrates whose sugar moieties are labeled with stable isotopes (Macnaughtan et al. 2008; Skrisovska et al. 2010). Glycosyltransferases could be used to assess the presence of a determinant as defined by its substrate specificity, although it is important to note that they cannot be used to solve entire structures or novel structures on their own. However, this is an extremely promising technology that should be further developed to help address the current shortcomings in NMR technologies for determining tertiary structure of glycans, especially those attached to their native proteins.

5.4.2.3. Glycosyltransferases in glycan engineering of cells

Because they are major components of the cell surface, glycans represent attractive targets for imaging physiology and pathophysiology, both in vitro and in vivo. As mentioned above, many glycosyltransferases are promiscuous and can accept unnatural substituents. At least one group has used this principle to introduce “bio-orthogonal chemical reporters” into monosaccharides that are taken up by cells and incorporated into cell surface glycans. These reporters enable the detection and imaging of glycan structures of living cells in model organisms using bio-orthogonal chemistry to attach fluorescent label or other biological tag that make the glycans “visible” (Laughlin and Bertozzi 2009; Sletten and Bertozzi 2011). This approach has been used to label cells for in vivo imaging in mice (Chang et al. 2010) and recently to image the sialome in zebrafish (Dehnert et al. 2012). The same principle can be used to introduce modified sugars directly on to the surface of cells using glycosyltransferases (Ramya et al. 2010; Zheng et al. 2011). This approach also provides information on the underlying glycans on the cell, because the enzyme has a strict specificity for the acceptor sequence it uses to form the product (Khidekel et al. 2004; Boeggeman et al. 2007; Ramya et al. 2010; Zheng et al. 2011). With a large toolbox of glycosyltransferases with well-characterized specificities, this approach can be used to gain much structural information about the glycans on the cell surface.

Glycoengineering approaches are also being used to influence cellular trafficking. For example, using a platform called “glycosyltransferase-programmed stereosubstitution,” scientists have modified existing cellular glycans to create the selectin ligand HCELL (hematopoietic cell E-/L-selectin ligand), which is involved in the attachment of circulating stem cells and white blood cells to endothelial cells. This technique has potential applications to the development of cell-based therapies (Sackstein 2009).

5.4.2.4. Glycosyltransferase inhibitors

Because of the central importance of glycosyltransferases to the synthesis of glycan structures, inhibitors of key enzymes would be of enormous benefit to elucidate the functions of glycans in cell communication and the roles of specific enzymes in the biosynthesis of glycans. Genetic ablation of specific glycosyltransferases in mice has already revealed important biological roles for glycans synthesized by the missing enzyme (Lowe and Marth 2003; Satoh et al. 2005; Ohtsubo et al. 2011). Many phenotypes from these mice have validated individual glycosyltransferases as targets for the development of inhibitors that would provide a therapeutic benefit. Small-molecule inhibitors to such enzymes would be invaluable to the research community as probes to uncover the biological roles of glycans and to assess their therapeutic utility. Specific inhibitors could also be used in place of or in combination with glycosyltransferase knockout mice to reveal additional novel phenotypes that provide information about the functions of glycan ligands and glycan-binding proteins. Despite the obvious need, few glycosyltransferase inhibitors capable of blocking glycosylation in vivo have been identified to date (Lachmann 2003; Brown et al. 2009). Several recent reports describe approaches for high-throughput screening of glycosyltransferase inhibitors that demonstrate the feasibility of screening for inhibitors of these enzymes (Helm et al. 2003; Gross et al. 2005; Rillahan et al. 2011). A systematic effort to screen for inhibitors of a panel of key glycosyltransferases is sure to open a path to the development of inhibitors that will benefit the research community and assess the potential of glycosyltransferase inhibitors as drug development targets.

5.4.3. Key Messages on Glycoenzymes

Enzymes have a range of uses as tools to study glycoscience, including in enzymatic synthesis of glycans, as biochemical probes, and in structural determination. Similarly, inhibitors of enzymes such as glycosyltransferases can be used as important tools in trying to better understand glycan biology and function. Despite their utility as part of the glycoscience toolkit, only limited numbers of glycan-active enzymes from both bacteria and mammalian species are available, and few three-dimensional enzyme structures, particularly from mammals, are known.

5.5. SYSTEMS GLYCOBIOLOGY

As a recent National Research Council report described:

The field of systems biology seeks to integrate … multiple levels of biological knowledge into descriptive, and ultimately predictive, mathematical models, combining experimental knowledge with computational tools in order to study the interactions between the components that make up a particular biological system. As a result, a primary goal of systems biology is to understand how the system being studied functions, what its properties are that arise from the interactions of its individual components (also referred to as emergent properties), and the design principles on which it operates (NRC, 2011, p. 27).

Similarly, systems glycobiology is an approach that integrates biological and chemical information about glycans with mathematical modeling and bioinformatics-enabled data analysis in an effort to understand the networks that control glycan structure and function. Informatics tools are key enablers for processing the data that arise from multiple sources—biochemical pathways (Hossler et al. 2007) and multiple types of analytical structure determination techniques, as well as mathematical and computational modeling. By analyzing and extracting information from this sea of data, glycoscience can be studied in this systems context and ultimately understood and manipulated in controlled ways.

Such research as the above illustrates the possibility of whole cell simulation, but in order to perform simulations of higher organisms' cells, glycosylation and other posttranslational modifications and their kinetic reaction data will need to be incorporated. Greater advances in the Analytical Tools will aid in this. Moreover, predictions or assumptions can also be incorporated to perform simulations as necessary. With the availability and success of such simulations, perturbations to the model will enable predictions of phenotypic effects (Karr 2012).

5.6. INFORMATICS AND DATABASES

It is becoming increasingly evident that complex relationships between genomic DNA, transcripts, proteins, and their posttranslational modifications, such as phosphorylation and glycosylation, critically govern phenotypes of whole organisms. The development of informatics to capture, analyze, mine, and disseminate sequence information and datasets associated with genes and proteins has been instrumental in advancing genomics and proteomics. One major area of glycomics deals with understanding complex glycans that are attached to proteins during posttranslational modification and the biological functions mediated by these glycan modifications. Informatics applied to glycomics has been faced with unique challenges. The biosynthesis of glycans is complex, nontemplate driven, and involves tissue-specific isoforms of several glycan biosynthetic enzymes. As a result, it becomes challenging to decipher the entire glycome of a whole organism in the same way that it has been possible for the genome and proteome.

The chemical heterogeneity of glycans also makes it challenging for any single analytical approach to provide a complete description of each glycan structure isolated from a glycoprotein or a cell type. Furthermore, glycan-protein interactions, leading to either the activation or inhibition of a biological response, are often not binary but rather involve more subtle mediation of a signaling pathway. In addition, glycan-protein interactions typically involve multivalency with regard to both the protein and the glycan. Because of these challenges, there are layers of ambiguity in determining primary sequence or chemical structure of a glycan that also impinge on understanding the specificity of glycan-protein interactions that modulate key biological functions.

An important factor in broadening appreciation of glycomics to the larger scientific community is the urgent need to develop databases, computational, and informatics tools to acquire, integrate, annotate, mine, and disseminate glycomics datasets such as analytical data, glycan array data, and glycogene expression data (Packer et al. 2008). Many earlier efforts in glycomics focused on structural characterization of glycans and on the development of glycan structure databases and computational tools to assist assignment of glycan structures from high-throughput analytical datasets. The development of these tools has advanced to a point where it is possible to obtain robust and detailed profiling of a majority of glycans isolated from cells, tissues, and individual glycoproteins.

To accelerate the development of additional databases and informatics tools, glycomics can to some extent borrow many of the tools that were developed for proteomics and genomics, but there are specific characteristics of glycans that require the development of different, and unique, tools. The most obvious difference is that glycans, unlike proteins or nucleic acids, are branched, isomeric, and constructed using several types of linkages. A common theme that unites these challenges is that there is no template from which glycan structure originates, and thus an “ensemble” of structures is created. Representing the complexity of glycan structures and the diversity of context—the fact that expression levels for each glycan, as well as glycosylation patterns, differ across cells and tissues—presents a significant challenge for bioinformatics approaches.

5.6.1. Limited Successes in Developing Broadly Available Informatics Tools

Many notable advances in glycomics informatics and database development have focused on interpretation of analytical data, including assignment of NMR and mass spectrometry peaks. These advances have, to a certain extent, made glycan analysis more accessible to the broader research community. For example, to assist researchers in the assignment of glycan structures and features based on NMR data, the characteristic NMR chemical shifts and coupling constants of glycans reported in the literature have been compiled in accessible databases such as at the glycosciences.de portal (http://www.glycosciences.de/sweetdb/; Lütteke et al. 2006) and CASPER (http://www.casper.organ.su.se/casper/; Loss et al. 2006), thus improving the accessibility of NMR as a tool for glycoscientists. Several tools, including Glyco-Search-MS (http://www.glycosciences.de/sweetdb/start.php?action=form_ms_search; Loss et al. 2002) and GlycoWorkbench (http://www.glycoworkbench.org/; Ceroni et al. 2008), have focused on interpretation of mass spectrometry fragmentation patterns through comparison to reference datasets, thereby deducing the most likely glycan structure. Additional development in this area, including pairing with proteomics to enable analysis of glycopeptides and proteins, will further increase the usefulness of these tools.

More recent efforts have focused on developing computational tools to mine multiple high-throughput datasets associated with gene expression studies, glycan profiling, and glycan array screening. One area of application of these tools has been in correlating and predicting profiles of glycan structures in a cell based on expression of glycan biosynthesis enzymes (Kawano et al. 2005). Another area of active development has been in mining glycan array datasets to identify glycan sequence motifs recognized by various proteins, such as plant and animal lectins, pathogen proteins, and antibodies (Hizukuri et al. 2005; Aoki-Kinoshita et al. 2006; Kuboyama et al. 2006; Hashimoto et al. 2008a,b; Porter et al. 2010; Jiang et al. 2011b). These glycan sequence motifs represent a combination of substructures that favor binding and those that are detrimental to binding. Furthermore, identification of such binding motifs facilitates using protein-glycan co-crystal structures to translate biochemical and biophysical aspects of glycan-protein interactions to the biology mediated by these interactions.

Data mining methods and tools have been developed to solve a variety of problems in glycobiology, including extraction of potential glycan biomarkers; prediction of glycan-binding patterns (Hashimoto et al. 2008a,b; Aoki-Kinoshita et al. 2006); and the analysis of glycan biosynthesis pathways (Krambeck et al. 2009), as provided by the RINGS Web resource (http://www.rings.t.soka.ac.jp). Many computer theoretical methods have been applied to glycan analysis, including pairwise (Aoki et al. 2003) and multiple alignment of glycans and the development of “score matrices” for analysis of glycosidic linkages (Aoki et al. 2005). Such applications of existing bioinformatics methods to glycobiology can be made to further elucidate glycan function (Aoki-Kinoshita, 2010). However, currently there is a severe lack in interest by the bioinformatics community in glycoscience as a result of the lack of a consistent database with relevant links to major databases and an understandable glycan representation format. Without easily available data of biological interest, bioinformatics research will not progress very far in the glycosciences, creating an ever-increasing gap between the genomics and proteomics world and glycomics. Moreover, without a consistent format for representing glycan structures, not only is there confusion regarding a “correct” representation of glycans, but also the integration of various computational tools becomes difficult.

5.6.2. Critical Need for Development of a Single Integrated Database

Clearly, developing a structural assignment database is key to a larger integrative effort to make glycomics accessible and relevant. Indeed, in the absence of a centralized database at a location such as the National Center for Biotechnology Information (NCBI), glycomics will not gain the attention and respect of the scientific community. Long-term funding and long-term stability of such an internationally supported database is absolutely critical for the future of glycosciences.

Larger and more complete informatics efforts can then focus on development of computational tools to correlate glycan structure with expression of biosynthetic enzymes to link biosynthesis and end product. Also, the development of new technologies, such as glycan array platforms to characterize glycan-protein interactions, have necessitated development of novel tools and database strategies for these high-throughput sources of data. In addition, there is a wealth of data on phenotypic analysis of knockout mice that lack specific glycan biosynthesis enzymes that could benefit from a database. Integration of gene expression, structural characterization, glycan motif recognition by various proteins, and whole-organism phenotyping data will enable a critical understanding of glycan diversity in a normal versus a perturbed cell and how these differences correlate with the physiological state of the cell. To truly “reduce this to practice,” the field will need relational databases to make sense of the huge amount of data that will come from such studies and to develop trait correlations that will ultimately lead back to candidate genes.

Unfortunately, current glycobiology databases are largely incomplete, disconnected, and inaccessible to the broader community and have a high percentage of incorrect entries that require correction. Analytical databases today provide only “sound bytes” and are missing a great deal of the complexity. In addition, other structural databases need to be made “glycan aware.”

To circumvent these challenges, it is critical that a centralized glycan database is created wherein all glycan structures that have been sequenced and published are registered. This database, then, could be expanded to include information on gene expression and organism phenotyping data. Also needed are reporting standards that specify the minimum information that should be reported about a dataset or an experimental process that allows a user to interpret and use the data entered. This may require manual independent curation of data, although it should be possible to develop a curation system to assist in annotations to a certain extent. In addition, there needs to be a glycan equivalent of the Phred Score for nucleic acid bases that can provide the user with a measure of the level of certainty of information on a given linkage in a given structure in a database (Ewing and Green 1998; Ewing et al. 1998). This will allow incomplete yet useful structural data to be included in databases.

Currently, the GlycomeDB database has incorporated many major databases in a way that consolidates unique structures and provides links so that the original database entries can be retrieved (Ranzinger et al. 2011). In contrast, the GlycoSuiteDB database is a manually curated database of structures from the literature, and thus the number of entries is small, less than 4,000 versus more than 36,000 in GlycomeDB (Cooper et al. 2003). The total GlycomeDB entries represents the sum of the several incorporated databases, and only about 1,000 of the structures are fully characterized, used in biologically known pathways, and nonredundant. Similarly, GlycoSuiteDB contains approximately 1,500 eukaryotic structures fulfilling those requirements. In general, there are 10,000 structures on average in the major glycan structure databases—EurocarbDB, KEGG Glycan, Bacterial Carbohydrate Structure Database (BCSDB), and Consortium for Functional Glycomics (CFG)—although it should be noted that, with the exception of BCSDB, most of these contain mainly eukaryotic glycans. These databases also mainly contain N- and O-linked glycan structures, whereas glycolipid and proteoglycan structures are few. Moreover, fully characterized glycan structures (including all linkage information) are limited to about 2,000 structures. Therefore, several issues must be addressed in developing a comprehensive glycan (or glycoconjugate) structure database.

5.6.2.1. Standardized representations of glycan (or glycoconjugate) structures

Because glycan structures are not linear, a simple single-letter code for monosaccharides is insufficient to represent glycan structures accurately. This is further complicated by the various naming schemes of monosaccharides. While a database of monosaccharides is currently available (MonosaccharideDB; http://www.monosaccharidedb.org), different researchers prefer to use different methods for representing glycan structures, including IUPAC, LINUCS (Bohne-Lang et al. 2001), Linear Code (Banin et al. 2002), GlycoCT (Herget et al. 2008), and KCF (Aoki-Kinoshita 2010). Although Glyde-II (Sahoo et al. 2005) has been established as the standard format for exchanging glycan structures, it is not human readable. There are also a number of ways to graphically represent glycan structures as cartoons, including the system originating from Stuart Kornfeld, expanded and optimized by Varki et al. (2009) and adopted by the CFG, and the Oxford system (Harvey 2011). To resolve this issue, informatics methods will need to accurately convert across different formats, which involves creating a knowledge base on the chemical structures behind the nomenclature for each naming scheme so that the residues are mapped accurately. MonosaccharideDB stores monosaccharide data as chemical information and provides mappings to various database formats. Database developers will need to keep in mind the various formats that are available and allow queries using different formats; this may be possible by linked glycan structure components with MonosaccharideDB.

5.6.2.2. Comprehensive representation of glycan and glycoconjugate structures

To characterize accurately the cellular glycome, development of more sensitive analytical techniques, including likely NMR and mass spectrometry, will be vital. In turn, the development of informatics methods to aid in structure determination will also be important, including ones that can integrate data from multiple techniques. It is likely that the development of such informatics methods will require collaboration among computer scientists, analytical chemists, and others.

5.6.2.3. Standard ontology for glycan function and localization

An ontology for representing glycan structures has been proposed, called “GlycO.” However, beyond structures, a formal representation of glycans and how they were determined, their functions, and their relationship to other molecules still needs to be established. MIRAGE—Minimum Information Required for a Glycomics Experiment standard—is currently being developed as a reporting standard for glycomics experiments, based on MIAME and MIAPE. MIRAGE aims to specify “the minimum information that should be reported about a data set or an experimental process, to allow a reader to interpret and critically evaluate the conclusions reached, and to support their experimental corroboration.” Such a standard will serve as the first step toward establishing a well-documented glycan structure database that can be linked back to the original experimental data. Further ontologies for annotating glycan function may be similarly based on existing ontologies for genes and proteins.

5.6.2.4. Links to protein, lipid, and other related databases

To integrate knowledge about glycans with the broader community, glycan structures registered in any database should be linked to the proteins, lipids, cells, and other entities to which the glycan structures were bound or in which they were found. Furthermore, links to the proteins, viruses, and other binders with glycans must be documented and linked wherever possible. Currently, to the committee's knowledge, the UniProt database is the only major protein database that contains information regarding potential glycosylation sites in amino acid sequences. To get to this information, however, the user must know to look for it in UniProt, because it is not directly accessible from GenBank or InterPro. Such links to the major protein and lipid databases will facilitate more communication with other related fields, and some progress is occurring in this area. Glycan information in GlycoSuiteDB is currently linked to UniProt. There are plans to link it to the UniCarbKB database as a combination of GlycoSuiteDB and EuroCarbDB. Bioinformatics methods can also be applied more easily when linked with larger resources of data. Additionally, this effort should link with other structural biology efforts aimed at defining conformation of glycan structures and their interaction with binding partners, because conformation has proven to be one of the driving parameters for specificity and affinity.

5.6.3. Key Messages on Glycan Bioinformatics and Databases

The current challenge for the bioinformatics field is to develop a unified, curated, stable database, with long-term funding, that encompasses glycobiology in a broader context. Although significant efforts have been made, a range of issues remain to be addressed and information about glycans is not accessible in a manner similar to other types of biological information. A particular challenge is standardization and annotation of glycan information for databases, including representation, level of structural certainty, and minimal information.

The development of a unified and integrated database resource not only would aid the field directly but would also help scientists from other disciplines, including clinicians, better appreciate, understand, and become involved in glycoscience. There is a need to develop bioinformatics tools that can make connections between disease and glycan structure and represent those connections in a straightforward manner. The initial efforts to create a database worthy of long-term support will require focus regarding its content and function, as defined by consensus of the community that will use it. Such a database will need to do a few things very well in a sustainable and unambiguous way that is independent of new methodologies. One first step could be the creation of a centralized structural database that can be extended by connecting it to other resources. Such a database must be based at a centralized location to assure long-term stability and continuity and cannot be dependent on any individual scientist or institution. Other supplemental databases with incomplete information may add value if made available in parallel to a fully curated and centralized database. A revolution in the development of such databases would bring other scientists into the field, demystify it, and provide a tool to educate individuals about glycoscience.

5.7. SUMMARY AND FINDINGS

As this chapter makes clear, a diverse suite of tools are available to synthesize glycans; understand glycan structures, functions, and interactions; and share and communicate glycan information across the research community. Important limitations in the toolkit currently restrict glycoscience to a field that is actively practiced by only a relatively small group of specialists. Existing tools are useful and provide a base from which to answer glycoscience questions; however, they are not adequate to advance the field to the point where it can realize its potential widely across biology, chemistry, and materials science. New energy and creative solutions, stemming not only from glycoscience specialists but from many others in the broader scientific community too, will be needed to address some of these technical challenges.

As a result, the committee finds that:

  • Scientists and engineers need access to a broad array of chemically well-defined glycans.
  • Over the past 30 years, tremendous advances have been made in chemical and enzymatic synthesis of glycans, but these methods remain relegated to specialized laboratories capable of producing only small quantities of a given glycan. For glycoscience to advance, significant further progress in glycan synthesis is needed to create widely applicable methodologies that generate both large and small quantities of any glycan on demand.
  • A suite of widely applicable tools, analogous to those available for studying nucleic acids and proteins, is needed to detect, describe, and fully purify glycans from natural sources and then to characterize their chemical composition and structure.
  • Continued advances in molecular modeling, verified by advanced chemical analysis and solution characterization tools, can generate insights for understanding glycan structures and properties.
  • An expanded toolbox of enzymes and enzyme inhibitors for manipulating glycans would drive progress in many areas of glycoscience.
  • A centralized accessible database linked to other molecular databases is needed to fully realize advancements in knowledge generated by an expanded effort in glycoscience. Glycan information is not currently accessible to the research community in an integrated and centralized manner similar to other biological information.
Copyright © 2012, National Academy of Sciences.
Bookshelf ID: NBK115021

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...