Chapter 2Saccharide Structure and Nomenclature

Publication Details

Primary contributions to this chapter were made by A.E. Manzi (Nextran Corporation, San Diego, California) and H. van Halbeek (University of California at San Diego).

FOLLOWING AN INTRODUCTION ON GENERAL CARBOHYDRATE NOMENCLATURE, this chapter provides an overview of the structure and chemistry of monosaccharides, focusing on their stereochemical features. The generation of complex carbohydrate structures from their monosaccharide constituents is described, pointing to the importance of the glycosidic linkage, including its position and configuration (anomericity). The final section of the chapter introduces CarbBank, a computer program that facilitates the search of the CCSD, a database of carbohydrate sequences.

Nomenclature

Suggested as a name more than 100 years ago without knowledge of detailed structure, carbohydrate is not an exact term. It applies to a very large number of materials and includes a wide spectrum of chemical structures. Originally referring to those naturally occurring substances that have a composition according to the formula (C•H2O)n, the meaning of the term carbohydrate today is far more general than “carbon•hydrates” and includes any substance that satisfies this criterion and many derived substances.

In general, carbohydrates contain a number of monosaccharides linked together as oligomers or polymers. The latter are referred to as oligosaccharides, polysaccharides, or, more generically, saccharides, sugar chains, or glycans. The term saccharide (derived from the Greek sakchar, meaning sugar or sweetness) is related to the characteristic taste of many of the mono- and disaccharides. Monosaccharides are the major, but not the only, components of glycans. The relationship of monosaccharides to complex carbohydrates is similar to the relationship between amino acids and proteins or between nucleotides and nucleic acids. The monosaccharide residues within complex carbohydrates may contain noncarbohydrate moieties, such as phosphate and sulfate groups.

The term glycoconjugate is often used to describe any macromolecule that contains a (mono)saccharide covalently linked to another building block of nature, such as an amino acid (peptide) or a lipid. Complex carbohydrates occur in such glycoconjugates as glycoproteins, glycolipids, and proteoglycans. The prefix glyco- or the suffix -saccharide or -glycan in those terms indicates the presence of monosaccharide constituents. Thus, the major classes of biological macromolecules may be designated as nucleic acids, proteins, (complex) carbohydrates (including glycoconjugates), and lipids.

The designation “complex” is an even less exact term than carbohydrate. A carbohydrate is often termed complex if it contains more than one type of monosaccharide building unit. Thus, the glucose-polymer cellulose would be a “simple” carbohydrate, whereas a galactomannan polysaccharide is an example of a complex carbohydrate. (However, so-called simple glycans, such as cellulose and starch, may have very complex molecular structures in three dimensions.) In the description of glycoprotein N-glycans (see Chapter 7), complex is used more specifically as a synonym for N-acetyllactosamine-containing chains, implying that high-mannose chains are characterized by a simpler monosaccharide composition. Finally, the term complex carbohydrates includes glycoconjugates, whereas the term carbohydrates per se would not. Additional nomenclature issues will be covered in the various sections of this chapter. For more detailed information on the topics covered in this chapter, see References 16.

Monosaccharides: Basic Structures and Stereoisomerism

Our understanding of carbohydrate structure has its origins in the latter part of the nineteenth century with the pioneering studies of Emil Fischer, who was the first to establish the structure of several of the monosaccharides. From a chemical standpoint, monosaccharides can be described as polyhydroxy aldehydes, polyhydroxy ketones, and derivatives thereof. All simple monosaccharides have the general empirical formula (CH2O)n, where n is an integer number ranging from 3 to 9. Regardless of the number of carbon atoms, all monosaccharides can be grouped into one of two general classes: aldoses or ketoses. (The -ose ending is characteristic in carbohydrate nomenclature.) Aldoses contain a functional aldehyde group (–CH=O), whereas ketoses contain a functional ketone group (C=O). Subclasses are then distinguished based on the number of carbon atoms according to the following terms: aldotriose, ketotriose, aldotetrose, ketotetrose, and so on.

Glyceraldehyde is the simplest aldose and dihydroxy acetone is the simplest ketose (Figure 2.1.a), and each can be conceived of as the parent compound of higher (CHOH) homologs in their class. The structures of glyceraldehyde and dihydroxy acetone are also distinct in that glyceraldehyde contains an asymmetric (chiral) carbon atom (Figure 2.1.b), whereas dihydroxy acetone does not. With the exception of dihydroxy acetone, all monosaccharides have at least one asymmetric center, the total number being equal to the number of internal (CHOH) groups (n – 2 for aldoses, n – 3 for ketoses with n carbon atoms). The number of stereoisomers corresponds to 2k, where k equals the number of asymmetric carbons. For example, an aldohexose with the general formula C6H12O6 and four asymmetric carbons, i.e., four (CHOH) groups, can exist in any one of 16 possible isomeric forms. Eight of these are d forms, and the other eight are l forms.

Figure 2.1. (a) Structures of glyceraldehyde and dihydroxy acetone in Fischer projection; (b) d- and l-glyceraldehyde in quasi three-dimensional representation.

Figure 2.1

(a) Structures of glyceraldehyde and dihydroxy acetone in Fischer projection; (b) d- and l-glyceraldehyde in quasi three-dimensional representation. The chiral nature of the central carbon in glyceraldehyde gives rise to two possible configurations of (more...)

With d-glyceraldehyde as the parent compound, Figure 2.2 illustrates the structures of all d-aldoses through the aldohexose group. The numbering of the carbon atoms follows the rules of organic chemistry nomenclature, such that the aldehyde carbon is referred to as C-1 and the carbonyl group in ketoses is C-2. The overall configuration (d or l nature) of each sugar is determined by the orientation of the CHOH group most distant from the aldehyde functional group (i.e., with the highest numbered asymmetric carbon atom; this would be C-5 in hexoses, C-4 in pentoses). The d configuration, with the OH at the aforementioned carbon atom projecting to the right, predominates in nature. The “linear” structural representations shown in Figure 2.2 are referred to as Fischer projection formulae.

Figure 2.2. Acyclic forms of the d series of aldoses, ranging from triose to hexose.

Figure 2.2

Acyclic forms of the d series of aldoses, ranging from triose to hexose.

Any two sugars that differ only in the configuration around a single chiral carbon atom are called epimers. For example, d-mannose is the C-2 epimer of d-glucose, whereas d-galactose is the C-4 epimer of d-glucose (cf. Figure 2.2). The names of monosaccharides are frequently abbreviated; most common are three-letter abbreviations for simple monosaccharides (e.g., Gal, Glc, Man, Xyl, Rib). Unless specified otherwise, the d configuration is implied in these abbreviated names. Furthermore, a symbolic notation for the monosaccharides that are most abundant in vertebrate glycoconjugates is used in this book (see Chapter 1, Figure 1.4). In solution, very few sugar molecules exist with free aldehyde or ketone groups; rather, they exist as cyclic hemiacetals or hemiketals, respectively. The hemiacetal linkage is formed from the condensation of an aldehyde group and a hydroxyl group (Figure 2.3). If the reaction is intra-molecular, as it is in monosaccharide, the resultant hemiacetal is cyclic. For reasons of chemical stability, five- and six-membered rings are most common. Generally, aldohexoses form six-membered rings via a C-1–O–C-5 ring closure; ketohexoses form five-membered rings via a C-2–O–C-5 cyclization to yield hemiketals; aldopentoses form five-membered rings through the C-1–O–C-4 linkage, or six-membered rings through a C-1–O–C-5 ring closure. Formation of a cyclic hemiacetal or hemiketal generates an additional asymmetric center at the original carbonyl atom. The new asymmetric center is termed the anomeric carbon. Two stereoisomers exist because the anomeric hydroxyl group can assume either one of two possible spatial orientations. In the linear Fischer representations, the structure with the anomeric hydroxyl group directed to the same side as the hydroxyl group at the highest numbered asymmetric carbon atom (C-5 for hexoses) is termed the α form and that with the opposite orientation (–OH at C-1 and C-5 going in different directions) is termed the β form. Anomeric isomers exist for all sugars with free “reducing ends” (see below), i.e., with a potentially free aldehyde group; in solution, these anomers are interconvertible and exist in an equilibrium mixture (see below, Mutarotation).

Figure 2.3. Hemiacetal formation.

Figure 2.3

Hemiacetal formation.

From Figure 2.4, it is obvious that a Fischer projection formula of a cyclic hemiacetal is an awkwardly looking and inaccurate representation of the ring structure. A slightly more realistic structural representation is the Haworth projection formula in which both five-membered and six-membered cyclic structures are depicted as planar ring systems, with the hydroxyl groups oriented either above or below the plane of the ring (Figure 2.5). Although it does not really represent the actual three-dimensional structure of a sugar, the Haworth representation has been used since the late 1920s as an easy-to-draw formula that permits a quick evaluation of the relative orientation of the –OH groups in the structure. Because of the structural similarity to the organic compounds furan and pyran, a five-membered cyclic hemiacetal is called a furanose and a six-membered hemiacetal ring is called a pyranose. (These ring sizes may be included with the abbreviated name of the monosaccharide, as f or p, in italics, for example, Glcp or Galf.) The Haworth representations are preferably drawn with the ring oxygen atom in the top (for furanose) or top right corner (for pyranose) of the structure; the numbering of the ring carbons increases in clockwise direction.

Figure 2.4. The α and β anomers (C-1 epimers) of d-glucopyranose (in cyclic hemi-acetal structure).

Figure 2.4

The α and β anomers (C-1 epimers) of d-glucopyranose (in cyclic hemi-acetal structure). The dashed lines represent a distorted bond, projecting toward the rear.

Figure 2.5. Haworth representations of furanose and pyranose structures.

Figure 2.5

Haworth representations of furanose and pyranose structures. The simplified Haworth form with carbons omitted from the ring is routinely used. Note their apparent similarity to the furan and pyran rings.

For any d sugar, the conversion of a Fischer formula into a Haworth formula proceeds as follows: (1) any groups (atoms) that are directed to the right in the Fischer structure are given a downward orientation in the Haworth structure, (2) any groups (atoms) that are directed to the left in the Fischer structure are given an upward orientation in the Haworth structure, and (3) the terminal –CH2OH group is given an upward orientation in the Haworth structure. For an l sugar, 1 and 2 are the same, but the terminal –CH2OH group is projected downward. The structures of α-d-glucopyranose and β-d-fructofuranose illustrate the conversion (Figure 2.6). Note the shorthand form, in which only dashes are used to represent the positions of the –OH groups and all C-linked H atoms are omitted.

Figure 2.6. Conversion from Fischer to Haworth projection formula.

Figure 2.6

Conversion from Fischer to Haworth projection formula. Each hydroxyl (or hydroxymethylene) group projected to the right in the Fischer projection points down in the Haworth formula.

Ring Conformation of Sugars

The planar Haworth structures are distorted representations of the actual molecules. The furanose ring is rather flexible and not entirely flat in any of its energetically favored conformations; e.g., it has a slight pucker when viewed from the side, as seen in the representations of the so-called envelope and twist (or skew) conformations (Figure 2.7). The preferred conformation of a pyranose ring is the chair conformation, similar to the structure of cyclohexane, and thus far from flat. In the chair conformation, the OH groups exist in either axial (vertical) or equatorial (nonvertical) positions (Figure 2.7). The conversion from Haworth projection to chair conformation leaves the downward or upward orientation of ring substituents unaltered. Two chair conformations can be distinguished, designated 4C1 and 1C4 or 4C1 and 1C4, respectively (Figure 2.7). The first numeral (sometimes written as superscript) indicates the number of the ring carbon atom above the “seat of the chair (C),” and the second numeral (subscript) indicates the number of the ring carbon atom below the plane of the seat (spanned by C-2, C-3, C-5, and the ring O). Chair conformations are designated from structures with the ring oxygen atom in the top right corner of the ring “seat,” resulting in the clockwise appearance of the ring numbering. The boat conformation is energetically unfavorable for almost any pyranose.

Figure 2.7. (a) α-d-glucose in Haworth projection and in its 4C1 and 1C4 chair conformations; (b) definitions of axial and equatorial orientations in a chair conformation (atoms C-2, C-3, C-5, and the ring oxygen form the seat of the chair); (c) envelope and twist conformations for five-membered ring structure.

Figure 2.7

(a) α-d-glucose in Haworth projection and in its 4C1 and 1C4 chair conformations; (b) definitions of axial and equatorial orientations in a chair conformation (atoms C-2, C-3, C-5, and the ring oxygen form the seat of the chair); (c) envelope (more...)

Chemical Reactions of Monosaccharides

Because of the presence of several functional OH groups in a ring structure and the potential for the existence of a free –CH=O (aldehyde) or C=O (ketone) group, monosaccharides can undergo reactions that are common to alcohols (especially those of polyhydroxy ring structures), aldehydes, and ketones. The scope of our consideration is limited to only a few of these, with a focus on reactions of biological significance.

Mutarotation

The interconversion of the α and β anomeric forms of a monosaccharide (mentioned above) occurs as the hemiacetal ring opens and recloses, yielding the opposite anomeric configuration. This process, which is catalyzed by dilute acid, involves the “open chain” structure with the free aldehyde or ketone group as intermediate (see Figure 2.4).

Mutarotation refers to the rapid change of the optical rotation (denoted [α]d) of a freshly prepared aqueous solution of a single, pure anomeric form of a monosaccharide. The α and β forms of d-glucopyranose are stereoisomers and thus, by definition, rotate the plane of plane-polarized light an equal amount but in opposite directions. β-d-glucopyranose shows an initial rotation of 19° (in water). The initial [α]d of α-d-glucopyranose (+112° in water) changes upon dissolution, until it reaches a constant value of +52.5°. Mutarotation arises from the complex equilibria set up on dissolution of monosaccharides, as shown schematically in Figure 2.8.

Figure 2.8. Mutarotation of d-glucose.

Figure 2.8

Mutarotation of d-glucose. The transformations are catalyzed by mildly acidic conditions.

The proportions of the five possible forms (α-p, β-p, α-f, β-f, and open chain) depends on the thermodynamic stabilities of each and varies widely from sugar to sugar. Generally, the acyclic (aldehyde) form is only present in minor amounts (<0.01%). Nevertheless, the important aspect of this transformation is that sugars, even in the absence of catalytic amounts of H+, can exist in the open-chain structure with a free carbonyl group. Consequently, a sugar can participate in a chemical reaction as either the open-chain form or the cyclic form. Thus, depending on the reaction, the sugar is depicted in the appropriate structure.

Esterification

Alcohols readily form esters when reacted with acids, anhydrides, or acyl halides. The most important types of sugar esters that occur in nature are (1) phosphate esters (including diphosphate esters), (2) acyl esters (with acetic acid or fatty acids), and (3) sulfate esters.

Oxidation

Three different acid derivatives of aldoses can be produced by oxidizing terminal groups to –COOH groups. Acids arising from the oxidation of the terminal –CH=O group are called glyconic acids. If the terminal –CH2OH group is oxidized, a glycuronic acid is produced, and if both terminal groups are oxidized, the product is a glycaric acid. The three acids derived from d-glucose are shown in Figure 2.9. Note that these acids have a tendency to undergo intramolecular esterification reactions, preferably yielding six-membered cyclic lactones. Two examples of such lactones are shown in Figure 2.9.

Figure 2.9. Oxidized forms of d-glucose.

Figure 2.9

Oxidized forms of d-glucose.

When being oxidized to a –COOH group, the free aldehyde group formally functions as the reducing agent. This type of oxidation reaction provides the foundation for the use of the term reducing terminus or free reducing end. The reducing power of a saccharide is traditionally tested by a colorimetric reaction: Reducing sugars readily reduce Fehling's solution and also ammoniacal silver nitrate.

Reduction

Two important reduced forms of sugars are polyhydroxy alcohols (e.g., alditols, i.e., reduced aldoses) and deoxy sugars. An example of each is shown in Figure 2.10a. Please note that an alditol has lost the ability to form a hemiacetal, i.e., to undergo ring closure. Myo-inositol (Figure 2.10b), the typical constituent of GPI anchors, has a cyclohexane-derived ring structure, but is not a sugar. Figure 2.11 shows the structures of a number of “special” monosaccharides (most of them deoxy monosaccharides) that occur in the glycoconjugates and polysaccharides to be covered in the following chapters.

Figure 2.10. (a) Examples of an alditol and a deoxy monosaccharide.

Figure 2.10

(a) Examples of an alditol and a deoxy monosaccharide. (b) Structure of myo-inositol.

Figure 2.11. A few non-(CH2O)n building blocks of glycoconjugates.

Figure 2.11

A few non-(CH2O)n building blocks of glycoconjugates.

Glycosides

When a hemiacetal (ROH) reacts with an alcohol (R′OH), the product is a full acetal (ROR′). The acetal product is called a glycoside; one distinguishes pyranosides and furanosides. In either case, the newly formed linkage of Canomeric-OR′ is called a glycosidic bond. The monosaccharide portion of the glycoside is termed the glycone, the R′-group the aglycone (except when the R′ group is another sugar; see below). The reaction as shown in Figure 2.12 is catalyzed by H+.

Figure 2.12. Glycoside formation: conversion of hemiacetal into acetal.

Figure 2.12

Glycoside formation: conversion of hemiacetal into acetal.

Oligosaccharides, Polysaccharides, and Glycoconjugates

When the alcohol R′OH in the previous section is one of the –OH groups in another monosaccharide, the resulting ROR′ acetal product is a disaccharide. The glycosidic linkage represents the covalent bond of all monosaccharide-monosaccharide interactions. The relationship of the glycosidic bond to oligo- and polysaccharides is the same as the relationship of the peptide bond to oligo- and polypeptides, and the phosphodiester bond to oligo- and polynucleotides. The glycosidic linkage involves the anomeric hydroxyl group, in α or β configuration, of one monosaccharide and any available hydroxyl group in a second monosaccharide. The formation of α(1→4) and β(1→6) glycosidic linkages between two glucose molecules is depicted in Figure 2.13. The symbolism identifies the OH sites (linkage positions) that are involved in the bond. It should be noted that the reducing-end sugar in an oligosaccharide in solution can undergo mutarotation as described for monosaccharides, the exception being those saccharides that end in a nonreducing disaccharide such as sucrose, i.e., Fru-β-(2↔1)-α-Glc (Figure 2.13), and trehalose, Glc-α-(1↔1)-α-Glc.

Figure 2.13. Structures of disaccharides.

Figure 2.13

Structures of disaccharides.

Two disaccharides that differ only in the position and/or configuration of their glycosidic linkage may have significantly different conformations, which in turn account for distinctive physical properties (compare maltose and gentiobiose in Figure 2.13). This statement is readily extrapolated to oligo- and polysaccharides. Moreover, the glycosidic linkage is the potentially most flexible part of a disaccharide structure. Whereas the chair conformation of the constituting monosaccharides is relatively rigid, the torsion angles around the glycosidic bond (ϕ, ψ, and ω; Figure 2.14) may vary to some extent. Thus, a disaccharide of well-defined primary structure can adopt multiple conformations in solution, differing in the relative orientation of the two monosaccharides with respect to each other. The combination of structural rigidity and flexibility is typical of complex carbohydrates and, more than likely, essential to their biological functions.

Figure 2.14. Torsion angles involved in glycosidic linkages.

Figure 2.14

Torsion angles involved in glycosidic linkages. Definition of the dihedral angles ϕ, ψ and ω. (a) Newman projection along the O-1–C-1 bond with the definition of the ϕ angle of a 1→4 linkage. (b) Newman (more...)

Depending on the number of monomers linked to each other via glycosidic bonds, an oligosaccharide is termed a disaccharide, a trisaccharide, and so on, with an upper limit of ten residues as the arbitrary distinction from polysaccharides. Structures are commonly written from the nonreducing end toward the reducing end; this polarity is similar to the convention to write protein sequences from the amino to carboxyl terminus. When a monosaccharide constituent of an oligosaccharide is involved in more than two glycosidic linkages, it serves as a branchpoint in the structure. The ability to form branched structures (as opposed to linear sequences, as commonly found in peptides) is an important feature of carbohydrates.

The term homopolysaccharide is used to indicate a carbohydrate polymer that is composed of identical monosaccharide residues; e.g., cellulose and amylose (both of which are glucose polymers) are homopolysaccharides. The presence of two or more different types of monomers characterizes a heteropolysaccharide. Oligo- and polysaccharides can be hydrolyzed by dilute aqueous acid to the constituent monosaccharides (see Chapter 38; Monosaccharide Composition Analysis).

Finally, glycoconjugates are molecules that contain one or more carbohydrate groups covalently linked to a peptide/protein, a lipid, or another biological or nonbiological molecule. The carbohydrate group may be as small as a single monosaccharide or as large as a high-molecular-weight polysaccharide. Often, as in typical glycoproteins, the carbohydrate moiety consists of several oligosaccharides.

CarbBank

A given number of monosaccharides have the potential to generate, through formation of glycosidic linkages, a large number of oligosaccharides of different primary structures. Most of these combinations do in fact occur in nature in various classes of complex carbohydrates. To assist the scientific community in keeping track of the primary structures of carbohydrates elucidated from different sources, the compilation of these structures in a computer-searchable database was begun in 1986. The occurrence of branched structures required specialized algorithms for searching the database, totally unlike those used for linear sequences as found in GenBank.

The complex carbohydrate structure database (CCSD) contains close to 50,000 records of carbohydrate structures and associated text (including literature references). The database includes more than 22,000 unique structures, derived from 14,000 different articles in the literature. CarbBank is the computer software that allows the user to access the information in the CCSD; the program (version 3.2; Spring 1999) runs on IBM-compatible personal computers under Microsoft Windows 95/98 or NT. It may be obtained free of charge, e.g., from ftp://ncbi.nlm.nih.gov/repository/carbbank. No version for Apple Macintosh computers is available. However, the CarbBank web site (URL: http://www.ccrc.uga.edu) allows on-line searches of the CCSD using a standard browser. Scientists are encouraged to use the web site to submit records from their (accepted) publications.

CarbBank has an editor to create database records and a search facility that allows the user to retrieve records based on text (author name, phrase in title of publication, trivial name), citation, molecular weight, composition, types of residues and linkages, complete structures (or fragments thereof), and a number of other criteria. Extensive search help is provided.

References

  1. 1. Allen H.J. and Kisailus E.C., eds. 1992. Glycoconjugates: Composition, structure, and function. Marcel Dekker, New York.
  2. 2. Collins P.M. and Ferrier R.J. 1995. Monosaccharides—Their chemistry and their roles in natural products. Wiley, Chichester.
  3. 3. El Khadem H.S. 1988. Carbohydrate chemistry—Monosaccharides and their oligomers. Academic Press, San Diego.
  4. 4. Guthrie R.D. and Honeyman J. 1974. Introduction to carbohydrate chemistry (4th edition). Oxford University Press, United Kingdom.
  5. 5. Morrison R.T. and Boyd R.N. 1992. Organic chemistry (6th edition). Prentice Hall, Englewood, New Jersey.
  6. 6. Rao V.S.R., Qasba P.K., Balaji P.V., and Chandrasekaran R. 1998. Conformation of carbohydrates. Harwood, Singapore.
Image ch1f4