1Introduction

Publication Details

1.1. UNDERSTANDING THE LANGUAGE OF LIFE: THE CENTRALITY OF SUGARS

Sugars (see Box 1-1) are everywhere. They are the foundation of all life on Earth. The most important biochemical process on Earth is photosynthesis—plants, algae, and other similar organisms using the energy in sunlight to combine carbon dioxide and water to make sugars. Many of the resulting sugars in plants end up as either starch or cellulose, both polymers of the sugar glucose. Such polymerized sugars—called oligosaccharides, polysaccharides, carbohydrates, or, generically, glycans—are the most abundant molecules on the planet. Cellulose is a polymer of glucose that provides the structural support for all plants and trees, as well as the raw material for clothing, paper products, and wood products. While humans cannot digest cellulose—it is an important part of the indigestible “fiber” in our diets—grazing animals can, and it serves as their major source of energy. Starch is another glucose polymer. It differs only subtly from cellulose, yet humans can digest it into its component glucose molecules, the central feedstock for our metabolic pathways. Human metabolism, and the metabolism of virtually all living things, harvests energy by breaking down glucose into water and carbon dioxide, which is then ready to undergo another round of fixation by photosynthesis.

Box Icon

BOX 1-1

Carbohydrate, Glycan, Saccharide, or Sugar? Carbohydrate: A generic term used interchangeably in this report with sugar, saccharide, or glycan. This term includes monosaccharides, oligosaccharides, and polysaccharides as well as derivatives of these compounds. (more...)

Glucose is key to life, but it is also central to disease. Diabetes, for example, results when glucose is not properly controlled by normal metabolic mechanisms. High concentrations of glucose can result in organ damage, while low concentrations can lead to loss of consciousness and sudden death due to inadequate energy. Diabetics must measure their blood sugar frequently to ensure proper glucose levels. Such measurements account for a significant number of the total number of diagnostic tests conducted each year in developed countries.

But glucose is not the only sugar molecule of importance to human health. Our cells carry complex sugars that comprise individual sugar molecules linked to one another in a multitude of ways. These complex sugars are usually referred to as glycans. Glycans are one of the four major classes of macromolecules—nucleic acids, proteins, and lipids being the other three—that are essential for life and are involved in every aspect of biology, medicine, and a number of practical applications. These other three classes often incorporate or rely on glycans for their activity—nucleic acids contain the carbohydrates ribose or deoxyribose, whereas proteins and lipids often require appended glycans for activity (glycoproteins and glycolipids, respectively). These structures, and combinations of these structures, contain information that is used for a wide variety of biological processes. Key facts about glycans and glycoscience are given in Box 1-2.

Box Icon

BOX 1-2

Important Facts About Glycans. Glycans are the most abundant family of organic molecules on the planet. The potential information content of glycans vastly exceeds that of any other class of macromolecules.

For example, one result of 3 billion years of evolution is that every cell of every organism is coated with a layer of glycans—the glycocalyx in animals or the cell wall in prokaryotes, plants, and fungi (see examples in Figure 1-1). The glycocalyx/cell wall contains high information content. On red blood cells the different sugars of the glycocalyx are responsible for the different blood groups—A, B, AB, and O (see Box 1-3). On cells of organs, these and other aspects of the glycocalyx can determine whether a particular person in need of a heart, liver, or kidney transplant can receive an organ from a particular donor.

Top: The image of a red blood cell shows the glycocalyx extending from the cell surface. Bottom: a model of a protein showing the glycans that are attached to it, including N-linked glycans, GPI-anchored glycans, and polypeptide chains

FIGURE 1-1

Glycans are significant components on biological surfaces and as parts of biological molecules. Top, Image of a red blood cell showing the glycocalyx extending from the membrane surface. SOURCE: Voet and Voet 2010, used with permission. Bottom, Scale (more...)

Box Icon

BOX 1-3

ABO Blood Groups. One of the most familiar ways in which the glycan information of a cell influences phenotype is the ABO blood grouping, which is a significant factor in determining which blood transfusions can be carried out. With rare exceptions, human (more...)

Indeed, cell surface glycosylation (i.e., the process by which cells create and display their glycocalyx) is as important to understanding life as is the genetic code, yet our understanding of the information contained in glycosylation is rudimentary at best. In large part this lack of knowledge results from two factors: (1) the remarkable structural complexity of glycans found on cell surfaces and (2) a lack of tools for deciphering glycosylation patterns. Glycans thus got “left behind” in the initial phase of the modern revolution in molecular and cellular biology, resulting in a generation of scientists who may be largely unfamiliar with and untrained in the study of these key molecules of life.

The complexity and high information content of glycans result from the many ways in which they can be assembled from simple sugar building blocks. This is in contrast to the simple ways that building blocks of proteins and nucleic acids—the amino acids and nucleotides, respectively—are linked together. Protein and nucleic acid biopolymers are linear, and every building block is linked to the next through the same kind of connection. By contrast, sugar building blocks can be linked together at many different sites and in different spatial orientations (i.e., stereochemistries), creating both linear and branched polymers with a wide variety of shapes (see Figure 1-2). Between the combination of structural diversity and different possible connection sites, the complexity of glycans increases rapidly. This diversity not only gives rise to many important and interesting biological functions and chemical properties but also creates challenges for synthesis, purification, and characterization—structure elucidation challenges discussed in detail later in this report.

Image comparing the chemical structures of a glycan, a nucleic acid, and a protein. Part A shows the three dimensional structure of a glycan, made of sugar building blocks, part B shows a nucleic acid made of linearly-connected bases, and part C shows a protein, made of linearly-connected amino acids

FIGURE 1-2

Comparison of nucleic acids, proteins, and glycans A, glycan; B, nucleic acid; C, protein.

The tools available today for fully characterizing the complex structures of glycans at low levels are mostly destructive, making it largely impossible to follow the changes in glycosylation that occur on a cell's surface over time. In addition, the diversity of glycan structures makes full characterization of the cell surface glycome (i.e., the totality of glycans with which a cell is coated) an incredible challenge, one beyond the capabilities of current technology. Today, it is possible to obtain only a general idea of the composition of the glycocalyx or cell wall, rather than a detailed molecular-level description. Yet these surface glycans are essential to both understanding and treating many diseases. The pattern of sugars on a cell causes pathogens—viruses and bacteria—to attack certain cell types. Many bacteria and viruses recognize specific sugars on particular cell types. In turn, a person's immune system generates antibodies to these invaders based largely on the glycans on these pathogens. Adding complexity, many pathogens carry out molecular mimicry of host glycans in order to evade immune responses. In addition, there is growing evidence that the glycans on cancer cells differ from those on normal cells, presenting a promising opportunity for diagnosis, imaging, and therapy. In addition to their roles on cell surfaces, glycans play important roles in biological communication and signaling (see Box 1-4).

Box Icon

BOX 1-4

Glycan Signaling in Nitrogen Fixation. Nitrogen is an essential element in biological systems and is a key component of proteins and other molecules. To be usable by most organisms, however, the nitrogen available in the atmosphere must first be fixed (more...)

In the area of energy, sugars play an increasingly important role as scientific innovations drive advances in developing energy sources that will be renewable and contribute less to global climate change. Complex glycans, such as the starches and cellulose in plant cell walls (referred to as biomass), are Earth's primary storage location for the products of fixation of carbon into molecules via photosynthesis. These glycans are being exploited as renewable sources of liquid biofuels, such as ethanol. As described above, these materials ultimately can trace their energy content to the sun, so they can be thought of as a form of solar energy—and just as renewable. The challenge is to efficiently harvest the energy contained in the large amount of glycans produced by plants.

Glycoscience is uniquely poised to make significant contributions to this need. The polysaccharide components of the insoluble cell walls include cellulose, hemicelluloses, and pectins—polymers of sugars that are sometimes linear (cellulose) and sometimes branched (hemicelluloses and pectins). These walls have a generalized global structure, with cellulose embedded in a matrix of other molecules, although the fine details of wall structure differ across plant species, across different plant tissues and organs, and indeed across walls in single cells. A major challenge to plant glycoscientists is to understand how these cell wall components are biosynthesized and how they are put together with lignin to form insoluble plant biomass, as well as how to manipulate and break down biomass more effectively in order to release the sugars for development into fuels.

Glycans can also be used as important materials—for example, as gelling agents in foods—and as a renewable resource for high-value chemicals, plastics, and pharmaceuticals. Wood, comprised of lignocelluloses, is a major building material and is used in myriad applications. Other materials, such as most plastics, are derived primarily from petroleum. Glycans can play an important role either as a starting material to the same types of feedstocks that are presently obtained from petroleum or as alternative materials that can be converted directly into plastics with similar or even superior properties to those of today's synthetic materials. As the ability to engineer polysaccharides and tailor their chemical structures and properties advances, the capacity to design new biochemicals and materials with properties that are unachievable today also will greatly expand.

1.2. GENES AND PROTEINS ARE NOT ENOUGH: THE RICH INFORMATION CONTENT OF GLYCANS

The current view of information flow in biological systems starts with the nucleic acid genome, which codes for proteins that function as parts of networks and whose own roles are still being actively studied. After proteins have been assembled, they are nearly always modified—a process generically called posttranslational modification. The terminal stage in this information flow is often the addition of glycans to proteins (glycosylation), which modulates the proteins' activity. One way of looking at this process is that the instructions in the genome encodes the properties that will ultimately be observable in an organism (phenotype), whereas the proteome predicts the phenotype. The glycome, however, is the phenotype. The system can also be compared to a switchboard, with the sugars being the “on” and “off” switches or turn pots that modulate the functions of glycoproteins and other molecules and help control the activity of the network. Beyond this digital view of biology, glycans also serve major analog functions, allowing modulating ranges of functions of glycoproteins and other molecules as well as metabolic circuits and networks. Working backward to understand biological systems will require starting with glycobiology, just as working forward requires starting with genomics.

Unlike nucleic acids and proteins, the structures of glycans are not “hard-wired” in the genome. Because of the multiple linkages that sugars can engage in that produce isomers and branching patterns, glycan structures cannot accurately be described as simple linear sequences of building blocks. Rather, a glycan's most basic structure must be described in three dimensions. Because glycan structures are not template encoded, they are plastic, reflecting myriad factors determined by cellular metabolism, cell type, developmental stage, nutrient availability, other cues from the cell's environment (Rudd and Dwek 1997; Varki et al. 2009), and stochastic events. As a result, the potential information content of glycosylation is far greater than for all the other types of posttranslational protein modifications combined. It is precisely this enormous diversity and plasticity that are critical to the many biological functions of glycans, particularly their modulation of glycoprotein activity or localization and their roles in mediating cell-cell or cell-matrix interactions that are key to both normal physiological development and diseases such as cancer.

1.3. HOW GLYCOSCIENCE BUILDS ON GENOMICS AND PROTEOMICS

Today, the glycoscience field is at a place similar to where genetics was at the conception of the Human Genome Project. At that time there was enough of an understanding of genetics to know that a concerted effort to sequence the human genome would lead to both fundamental advances in our understanding of genetics and practical applications that would benefit all fields of science. When this enormous effort began in the 1990s, many scientists questioned if it was even feasible to sequence the 3 billion bases in a human genome. Ten years and $2 billion later, the Human Genome Project not only had sequenced a single human genome but had also spawned a technological revolution that today makes it possible to sequence a human genome in only a week at a cost of $1,000. Similarly, the cost of identifying a single nucleotide polymorphism (SNP), a commonly used marker for genetic traits such as disease, fell from $1 per SNP to $0.004 per SNP, opening the door to a wide range of biological questions inconceivable even 10 years ago.

Another impact of the Human Genome Project has been the democratization of genomics. The result is a revolution in our understanding of genetics that spans the simplest single-celled organisms to the characterization of human variation and disease. Sequencing instruments used to be huge and expensive, and, as a result, sequencing was done only at regional centers. Today, sequencing instruments can sit on a benchtop in any laboratory. Now, any laboratory can get DNA sequenced; computer programs can predict structures from sequences for DNA, RNA, and proteins; and DNA or RNA can be ordered online and delivered the next day.

How did all of this happen in such a short period of time? The transformation of genomics, and the generation of an entire new industry, started with the research community issuing a grand challenge that was a huge leap, something beyond any technical capability available at the time. In the end, the tools that were developed to meet this grand challenge now enable and drive the science. The tools of genomics have democratized the field in such a way that thousands of laboratories are now able to ask and address questions that were previously the realm of only a few specialized facilities. Any scientist interested in getting sequence information can do so. Today, because of incredible success at developing sequencing tools, the real cost of sequencing a genome is dominated by informatics, not by the physical process of sequencing. Making sense of genomic data costs far more than acquiring the data.

Glycoscience needs to similarly catalyze its transformation from the realm of a few specialists to a core science practiced by many. To accomplish this transformation, new technologies are needed to thoroughly characterize glycomolecules and synthesize them. Both genomics and proteomics have methods for automated synthesis, sequencing, and amplification. The emerging field of glycomics does not. There are large libraries of genes and proteins available for study but only small libraries of glycans and glycoconjugates. Genetic manipulation of genes and proteins is easy but is hard for glycans and glycoconjugates. Finally, the number of enzymes available for manipulating genes and proteins is far larger than the number of glycosidases and glycosyltransferases available. Learning from the experience of genomics, glycomics will need many new and sophisticated informatics solutions to stay abreast of technological developments and avoid the bottlenecks that now limit the advances that come from modern genomics and proteomics.

1.4. WHY NOW? THE CASE FOR CHANGE

To fully understand the workings of living organisms and to fully realize the promise of genomics and proteomics, it will be imperative that science now turn its efforts to deciphering the complexity of glycomics. Unless attention is paid to glycans, a major component of biology will be missed. Glycoscience cannot be overlooked. Without a better understanding of the glycome, a clear understanding of cancer, infectious diseases, and the immune response will not be possible. Glycoscience knowledge will be similarly needed in the exploration of improved biofuels and alternative sources of carbohydrate-based energy and in the development of carbohydrate-based materials with functional new properties. It will not be possible to take full advantage of the revolution in genomics and realize the full potential of the Human Genome Project unless close attention is given to glycomics and how cells make and use the myriad complex glycans that decorate their surfaces. At the same time, advances in genomics resulting from the Human Genome Project provide a major opportunity to understand how mutations alter glycan pathways with functional consequences. Indeed, the time is right for the glycoscience community to initiate an undertaking that leads those conducting biological studies to seriously consider incorporating glycoscience into their work.

Several recent advances make now the time to examine challenges and opportunities in glycoscience and outline a possible roadmap forward. In health, for example, changes in glycosylation are common in tumor cells and specific glycans have been identified as biomarkers for a variety of cancers (Adamczyk et al. 2012). In some cases, this information is being combined with array technologies to provide a base from which to explore key questions in cancer biology. Do particular glycosylation changes play a role in cancer outcome? Which glycans can serve as the most effective biomarkers for different stages and different types of cancer?

In 2011, the U.S. Department of Energy released an update to the Billion-Ton Study, which re-emphasized the significance of biomass feedstocks from non-food crops for energy and materials (DOE 2011). Many of the energy-rich, non-food crops require the conversion of recalcitrant cellulose into useful chemical precursors. Discoveries in the biological pathways by which plant cell walls are synthesized and deconstructed are similarly providing a compelling base from which to further advance the applications of glycoscience to these fields.

Just as studies of nucleic acids and proteins rely on a suite of tools that allow a broad range of researchers to effectively investigate these molecules, so too does glycoscience rely on its own toolkit. Over the past decade, developments in synthetic and analytical methods such as glycan microarrays are enabling high-throughput analysis of the interactions of glycans with proteins, lipids, and other glycan molecules (Rillahan and Paulson 2011). These data are increasingly being combined into glycan databases, to share and aggregate research results within the glycoscience community (Frank and Schloissnig 2010).

Genomics and proteomics have advanced rapidly. Glycoscience and glycomics also have made strides in enabling scientists to understand the role that glycans play in biological systems. Glycoscience researchers have been developing a fundamental knowledge base that can be utilized to help address many of today's major research problems. This knowledge base, when combined with the current set of available tools to probe glycan structure and function, is a powerful resource to better understand human, plant, and microbial biology.

Glycoscience has, until recently, been explored by only a small group of experts, working with more limited information and resources than are available in fields such as genomics and proteomics. What is known about glycoscience and glycomics, the study of the complete set of glycans in an organism, is still incomplete. But the knowledge currently available now makes it possible to integrate glycoscience broadly into the fields of human health, energy, and materials science, and the set of tools, while not perfect, provides a base to enable further development and discovery.

1.5. CHARGE TO THE COMMITTEE

Recognizing that glycoscience presents a frontier for discoveries across many fields, the National Institutes of Health, Food and Drug Administration, U.S. Department of Energy, and National Science Foundation asked the National Research Council to convene a committee to explore advances in glycoscience and challenges that must be overcome to move the field forward. The committee was also tasked with articulating a roadmap and a vision for future development of the field (see Box 1-5).

Box Icon

BOX 1-5

Statement of Task. The National Research Council of the National Academy of Sciences will convene an ad hoc committee to assess the importance and impact of glycoscience and glycomics. Glycoscience is the confluence of scientific disciplines that study (more...)

The committee deliberated at three in-person meetings and held numerous teleconferences to address its charge and produce the present report. In addition, the committee convened the Workshop on the Future of Glycoscience in January 2012, which brought together approximately 75 glycoscientists and scientific thought leaders with expertise in biology, chemistry, and materials science to discuss the field and its opportunities and needs. The workshop agenda and participant list are provided in Appendix C. The committee also solicited input from the broader scientific community through its public website, which included several questions to inform the study process. These questions are provided in Appendix D, along with further information on the feedback received and the individuals who shared their thoughts with the committee. This report does not focus on the roles of carbohydrates as food sources and nutritional supplements. Although these are important areas to be explored, they were outside the scope of the committee's study and outside the expertise of the committee's members.

1.6. ORGANIZATION OF THE REPORT

Chapter 2 discusses current glycoscience research efforts in the United States and worldwide. This general baseline helps inform the rest of the report, which lays out a vision for the future of the field. The chapter provides a brief overview of key messages arising from the committee's data gathering, with further details and examples included in Appendix B. In Chapter 3 the committee discusses how glycoscience is embedded in the key areas of health, energy, and materials science—areas that help illustrate the breadth and impact of glycoscience as a discipline. In Chapter 4 the committee poses a set of scientific questions and opportunities designed to illustrate more concretely how new glycoscience knowledge would contribute to answering relevant scientific questions in these fields. These questions are not meant to be comprehensive but rather to provide examples of scientific challenges that, if solved, would yield important basic and applied knowledge. Chapter 5 considers the toolkit for glycoscience in such areas as synthesis, analysis, and informatics. These tools are integral to studying glycoscience and will be needed to successfully address the types of challenges described previously. Finally, Chapter 6 presents the committee's conclusions and recommendations. In conjunction with each recommendation, the committee suggests several 5- and 10-year goals whose accomplishment would significantly advance the field. Together, these goals comprise a roadmap to help enable glycoscience to forge new roads of discovery.

The introductory and concluding chapters of this report are written with a general audience in mind. Chapters 3 and 4, which delve more deeply into the myriad ways that glycans contribute to the three focus areas of health, energy, and materials, presume a basic level of scientific familiarity, although of necessity do not cover each topic in detail. Chapter 5, which describes the current scientific toolkit for studying glycans, is written largely for the scientific community and for those who have primary responsibility for shaping research programs and directions. The committee's assessment of this toolkit and of the needs and gaps remaining to advance the field is encapsulated in the report's concluding chapter, which lays out a glycoscience roadmap and research goals. Appendixes to the report contain committee member biographies (Appendix A) and additional information on the committee's data-gathering efforts (Appendixes B, C, and D). A glossary of terms also is included (Appendix E).