NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on Computer-Assisted Modeling. Computer-Assisted Modeling: Contributions of Computational Approaches to Elucidating Macromolecular Structure and Function. Washington (DC): National Academies Press (US); 1987.

Cover of Computer-Assisted Modeling

Computer-Assisted Modeling: Contributions of Computational Approaches to Elucidating Macromolecular Structure and Function.

Show details

8.Structure and Function of Complex Carbohydrates

Complex carbohydrates are very common in animals, plants, and bacteria. They are constituents of cell membranes, as well as subcellular materials of cells. They are also found in physiological fluids such as blood, tears, milk, and urine. It was estimated recently that the covalent structures of between 4,000 and 6,000 natural carbohydrates have been determined (DOE, 1987). Many complex carbohydrates are unsubstituted at their reducing ends and are referred to as polysaccharides; examples include the oligosaccharides of milk, the cellulose of plant cell walls, and storage forms such as starch and glycogen. Many other naturally occurring complex carbohydrates are covalently connected to other molecules, such as proteins or lipids, by glycosidic linkages of the sugar residues at their reducing ends to form glycoconjugates.


Glycoproteins have many functions in higher organisms. Collagen is an important structural element in the extracellular space and in cartilage, bone and basement membranes. Mucins are significant as lubricants and protective agents in mucous secretions. Important immunological molecules of the glycoprotein class include the immunoglobulins, histocompatibility antigens, blood group antigens of the ABO and Lewis types, complement in the blood clotting mechanism, and interferon. Many human plasma proteins such as fetuin, transferrin, and ceruloplasmin are glycoproteins, as are several of the hormones such as chorionic gonadotropin and thyrotropin. Most of the animal and plant lectins are glycoproteins, as are the lysosomal enzymes. The recognition and binding of lysosomal enzymes to specific receptors in the Golgi apparatus and on the cell surface involves one or more phosphorylated mannose residues on N-linked oligosaccharide chains. Recognition sites on cell surfaces for binding and uptake of hormones and for interactions with other cells, viruses and bacteria are also glycoproteins.

Many of the cell surface functions of glycoproteins have also been proposed for the neutral and acidic glycosphingolipids. In addition, certain glycosphingolipids of the ganglioside class have been found recently to inhibit the mitogenic response of cell growth factors by allosteric modulation of their cell surface receptors (Bremer et al., 1986). Oncogenic transformation by viral infection or chemical mutagens usually leads to alterations in the cell surface pattern of glycosphingolipids such that certain types increase greatly in quantity. In some cases, there are also qualitative differences due to the expression of genes that are silent in the differentiated normal cells. This is particularly important in tumor cells, where tumor-associated antigens may provide a basis for specific monoclonal antibody-based diagnostic assays and eventually, perhaps, treatment.

The binding between glycosaminoglycans and other extracellular macromolecules contributes significantly to the structural organization of connective tissue matrix. All of the glycosaminoglycans, except those that lack sulfate groups or carboxyl groups, bind electrostatically to collagen at neutral pH because of their remarkable anionic character. Dermatan sulfate, which appears to be the major glycosaminoglycan synthesized by arterial smooth muscle cells, binds strongly to plasma lipoproteins, and heparin also interacts with several plasma proteins, including clotting fac tors IX and XI and antithrombin III. Interestingly, the 1:1 stoichiometric binding of heparin to Lys residues of antithrombin III is believed to induce a conformational change in antithrombin III that increases the binding of antithrombin III to thrombin. This binding inactivates the thrombin. Hyaluronic acid is deposited on the surface of Petri dishes by cells growing in tissue culture, giving them a substratum for attachment during growth. The proteoglycans have also been implicated in the regulation of cell growth, possibly through nuclear effects on chromatin structure and activation of DNA polymerase, and may mediate cell-cell communication and the shedding of cell surface receptors.


The role of carbohydrates in biological function poses a particularly challenging problem for the future. The synthesis of these glycoconjugates occurs during their intracellular transport from the site of initial assembly of a lipid-linked intermediate (glycoproteins) or ceramide (glycosphingolipids) in the endoplasmic reticulum, through the Golgi apparatus, to the cell surface, intracellular organelles, or extracellular space. Their synthesis requires a family of activated sugar donors called sugar nucleotides that are synthesized in the cytosolic fraction of cells from sugar phosphates and nucleoside triphosphates. An interesting exception is the sugar nucleotide of sialic acid, called cytidine monophosphate sialic acid (CMP-NeuAc), which is synthesized in the nucleus from free sialic acid and CTP. The enzymes involved in glycoconjugate biosynthesis are glycosyltransferases that catalyze the transfer of sugar residues from the sugar nucleotides to the nonreducing end of a growing carbohydrate chain.

The distinction between glycoconjugate biosynthesis and protein synthesis is key; the latter occurs on a template of messenger RNA and is therefore determined by the genetic code for a single structural gene. 1 In sharp contrast, glycoconjugate synthesis is accomplished by the stepwise addition of sugar units using a different enzyme for each step. Therefore, no single DNA sequence is involved in determining the primary structure of the complex carbohydrate, since the order in which sugars are added depends on the substrate specificities and kinetic characteristics of the different glycosyl transferases, each of which is coded by a different structural gene. It is clearly impossible to predict the primary structures of complex carbohydrates from DNA sequences. Therefore, the three-dimensional structures of glycoproteins, glycosphingolipids and other complex carbohydrate-containing molecules can never be completely predicted without experimental structural analysis of the carbohydrates.

Snider (1984) reported that glycoproteins of the N-linked type are synthesized as a cotranslational event in the rough endoplasmic reticulum. While the polypeptide chain is being translated on a messenger RNA and concurrently passed through the endoplasmic reticulum membrane into the cisternal space (lumen), a single oligosaccharide is coordinately synthesized on a phosphorylated polyisoprenoid alcohol (dolichol in higher animals and smaller, similar substances in insects, yeast, and plants). The entire precursor oligosaccharide is then transferred to appropriate asparagine residues on the nascent polypeptide chain (probably before folding into a tertiary structure) according to rules of specificity that are not completely understood. Transfer requires an Asn-X-Ser or Asn-X-Thr sequence but additional factors are involved as well. Accessibility of the Asn residue may be one such factor and assessment of this possibility could be made by the predictive methods described in this report.

The second stage of N-linked glycoprotein synthesis involves extensive posttranslational modification of the protein-linked precursor oligosaccharide by the removal and addition of sugars. In many cases the protein moiety is also modified by partial proteolytic cleavages and/or the addition of function-modifying groups on specific amino acid residues. Posttranslational modification is initiated in the rough endoplasmic reticulum by the removal of the three glucose residues by two specific membrane-bound glucosidases. These glucose residues appear to have the sole function of enabling transfer of the oligosaccharide chain from dolichol pyrophosphate to nascent polypeptide chains. It will be interesting to determine from three-dimensional structures and predicted conformations how these groups interact with the transferase enzyme involved at this step. Mature high mannose oligosaccharide chains are synthesized by the subsequent removal of up to four mannosyl residues from the three branches of the precursor structure. At least three different alpha-mannosidases in the Golgi apparatus are involved in this process. These enzymes and the two glucosidases are hydrolases like lysosomal glycosidases but their activities are greatest at neutral pH, in contrast with lysosomal enzymes that have their greatest catalytic activity at an acid pH.

In eukaryotic cells, the high mannose oligosaccharide with five mannose units (See Figure 8-1) is the direct precursor of complex and hybrid structures. The initial step in the Golgi apparatus is the addition of an N-acetylglucosamine residue to the last remaining Man on branch I (*), after which the remaining two Man residues on branches II and III can be removed by alpha-mannosidases that are almost certainly different from those involved in earlier steps. Additional branches may be made at this point to produce tri- and tetra-antennary structures, and the final stages of processing are carried out by the addition of galactose, N-acetylglucosamine, sialic and fucose residues to give mature, complex, N-linked chains. An interesting N-acetylglucosaminyltransferase may add a beta-1,4-linked GlcNAc residue to the branched beta-linked mannose residue of the inner core region (0) to give a “bisected structure.” This step has been the subject of intensive study by Carver and coworkers, who have been interested in the structural specificity of the enzyme with different conformations of the precursor oligosaccharides (Carver and Brisson, 1984).

FIGURE 8-1. Intermediate partially processed asparagine—linked carbohy drate chain of a glycoprotein.


Intermediate partially processed asparagine—linked carbohy drate chain of a glycoprotein.

It is likely that predictive methods will be employed in studies of processing pathways and the extent of processing of oligosaccharide chains. If control arises from an enzyme specificity for a particular three-dimensional structure of the substrate, it may be possible to determine these preferences and, from predictions of the distributions of three-dimensional structures of the oligosaccharide attached to the glycoprotein substrate, predict how far the carbohydrate chain will be processed.

Lysosomal enzymes contain one or more phosphate groups on mannose residues of the high mannose type oligosaccharide chains. The mannose-6-phosphate groups are specific recognition markers that are involved in the transport of lysosomal enzymes from the Golgi apparatus or outside the cells into lysosomes. Two membrane-bound mannose-6-phosphate receptors have been discovered in the plasma membrane; at least one of them also resides in the Golgi membranes. Although their binding specificities have been probed in some detail, other aspects have not been determined: the nature of the interaction of the phosphorylated mannose residues with the receptors and the three-dimensional structures of the lysosomal enzyme-receptor complexes.

Another interesting aspect of lyosomal enzyme synthesis involves the determination of structural domains on the folded proteins recognized by the enzyme that initiates phosphorylation of mannose residues, which is an N-acetylglucosamine-phosphotransferase (GlcNAc-P transferase) in the Golgi apparatus. This is the mechanism by which only lysosomal enzyme proteins are selected for phosphorylation. It is especially important because one form of a genetic lysosomal storage disorder, called mucolipidosis II, results from a defect in the binding domain of the GlcNAc-P transferase for lysosomal enzyme proteins. Perhaps this problem can be solved only by computer modeling to predict the three-dimensional structures of both proteins.

Glycosphingolipids are synthesized in an analogous manner, except that ceramide serves the function served by dolichol for glycoproteins and transfer occurs directly from a sugar nucleotide to the acceptor glycolipid. Ceramide is an acceptor for either glucose (from UDP-Glc) or galactose (from UDP-Gal), giving glucosylceramide or galactosylceramide. These simple glycosphingolipids predominate in human plasma and the brain, respectively, and also serve as precursors for more complex glycosphingolipids. In most organs, including the brain, the major pathways involve conversion of glucosylceramide to lactosylceramide, Gal-beta-1,4-Glc-Cer. Lactosylceramide is the substrate for several glycosyltrans ferases, the products of which are the first intermediates in the synthesis of related glycosphingolipids that may be classified according to their general structural characteristics. More than 100 different glycosphingolipids have already been characterized, and new compounds are still being discovered. Although some of the glycosphingolipids may contain between 15 and 35 or more sugar residues, most of the commonly occurring types have between 4 and 10 residues in the oligosaccharide chain.


A complete understanding of the interactions between carbohydrates and proteins (enzymes, lectins, antibodies, and cell surface receptors) will depend on the determination of accurate three-dimensional structures of both kinds of molecules. As was noted, the primary structures of the oligosaccharide chains of complex carbohydrates cannot be deduced from DNA sequences and so must be determined by chemical and spectroscopic analysis. Modern chromatographic methods of separation, along with mass spectrometry and nuclear magnetic resonance (NMR), allow us to carry out complete analysis of a primary structure on a one micromole sample. Still to be determined are composition; arrangement of sugar residues; ring size; positions of glycosidic linkages and their anomerity; and the location and the chemical nature of non-carbohydrate substituents such as lipids, sulfate, and phosphate groups.

Three-dimensional structures of carbohydrates represent the spatial arrangements of the individual sugar residues. Most commonly occurring mammalian complex carbohydrates consist of sugar residues that exist in the pyranose ring form, the most stable and rigid conformation of which are the chair forms. When two sugar residues are joined together covalently in a glycosidic linkage, they are free to rotate about the glycosidic oxygen atom between the two rings, and the resulting disaccharide can therefore assume a number of different conformations corresponding to the rotations about these two bonds. It is customary to designate the dihedral angles at the glycosidic linkage (See Figure 8-2) by the Greek symbols phi (ϕ) and psi (ψ), where the initial conformation (ϕ = 0°,ψ = 0°) is that conformer where the C-l—H-1 bond eclipses O—C′-X′ and C-1—O eclipses C′-X′—H-X′.

FIGURE 8-2. Dihedral angles determining the spatial relationship of two sugar residues in a disaccharide.


Dihedral angles determining the spatial relationship of two sugar residues in a disaccharide.

The relative orientations of adjacent sugar residues in an oligosaccharide chain are described by specifying the rotational angles (ϕ, ψ) at each glycosidic oxygen atom. When these angles are the same at each linkage, the chain has a helical conformation with n residues per turn and h unit translation along the helical axis. If n and h are available from x-ray data, then ϕ and ψ can be computed and vice versa. If ϕ and ψ are different among glycosidic linkages in an oligosaccharide chain, the three-dimensional structure becomes non-periodic and, for extreme variations, assumes a random coil conformation. Information about perturbations can be obtained by light-scattering, viscosity, sedimentation, and diffusion measurements.


Of the three major classes of complex biological molecules, we have the least structural information at atomic resolution about carbohydrates. This is because they have not been crystallized, and consequently there is no relevant crystal structure data base other than that of the simple monomers to trimers upon which to model classical or semiempirical quantum mechanical calculations. The blood group-specific oligosaccharides, cord factors, and lipids A and X are typical examples. Exceptions are the cyclodextrins, which crystallize well, but are conformationally a separate class. Structures derived from the fiber-patterns of polysaccharides are model-dependent and do not constitute a source of definitive structural data. Stachyose, an oligosaccharide consisting of four sugar residues, is the largest noncyclic oligosaccharide for which there is a crystal structure analysis, but even in this case, the associated water structure has not been determined.

The crystallinity problem is only partially intrinsic. Carbohydrates do not solvate the same way as proteins, oligonucleotides, or nucleic acids. However, fewer efforts have been made to obtain the significant amounts ofconfigurationally homogeneous material needed to conduct crystallization experiments than were made for proteins and nucleic acids. Another aspect of the crystallography of glycoconjugates is that the electron density for the oligosaccharide portion of glycoproteins has rarely been interpreted, even though several crystalline glycoproteins have been studied. This is because the standard refinement programs cannot handle the oligosaccharides, or there is microheterogeneity at the site of glycosylation, and so it is left out of the model. Thus, a potentially valuable source of information is not being exploited for lack of appropriate program development or strategic approaches to deal with microheterogeneity.

Steric considerations about the minimum approach distances between atoms, derived from observed nonbonded distances in various crystal structures, can be used to predict allowed conformations. This “hard sphere” approach, which was originally developed by V.S.R. Rao in the mid-1970s, is a rudimentary method of theoretical calculation that ignores electrostatic effects (hydrogen bonding), but does give a qualitative prediction of structure. This approach was subsequently extended by adapting energy calculations originally used for peptides, where the potential energy is divided into functions that describe discrete contributions such as van der Waals energies, electrostatic interactions, torsional energy, hydrogen bond energy, and bond and angle deformations (Bock, 1983). The data are presented in the form of computer-generated energy contour maps.

In much of the recent literature, conformational energy calculations have been made using a form of Rao's parameters with an added torsional potential about one of the glycosidic bonds (exoanomeric effect). This approach, which goes by the name HSEA (hard-sphere exoanomeric) method (Bock, 1983), has been used with success by Lemieux and Bock (1983), Carver and Brisson (1984), and others, although it contains a number of untested assumptions. The addition of a hydrogen bond potential (HEAH method) yields energy minimization results that differ from those calculated by the HSEA method, from which geometries can be derived that differ from those obtained by the HSEA method.


Proton NMR methods provide detailed experimental data from which three-dimensional structures can be determined and compared with conformations arrived at by potential energy calculations. Carver and Cumming (1987) have generated contour maps of computed NOEs of various high mannose oligosaccharides as a function of the torsional angles φ and ψ. They then related them to experimental results as well as to minimum energy conformations estimated by various potential energy calculations (Carver and Cumming, in press). Brisson and Carver (1983) evaluated the utility of this approach using two biantennary complex type glycopeptides (See Figure 8-3). Since the NOE-derived conformations were within a range centered on the minimum energy conformations derived from potential energy calculations, it was concluded “that motional averaging is confined to a narrow range about one stable conformation” (Brisson and Carver, 1983). It now appears, however, that it is meaningless to seek a single NOE-derived conformation that satisfies a single potential energy minimum, because the molecules in fact may occupy such minima for a very small proportion of the time in solution. “Conformational flexibility must be incorporated into the theoretical treatment” (Carver and Cumming, 1987), and the calculation of energy surfaces becomes extremely important. The latest studies by Cumming and Carver indicate that NOE-determined three-dimensional structures may differ significantly from any minimum energy conformation. They have concluded from this that the NOE-derived conformations in such cases might correspond to “virtual” conformations as defined by Jardetzky (1980) to be computed structures that few if any molecules in solution actually adopt.

FIGURE 8-3. Structures of two partially processed asparagine—linked carbohydrate chains.


Structures of two partially processed asparagine—linked carbohydrate chains. The bisecting β1,4GlcNAc of B causes a conformational difference from that of A.

Scarsdale et al. (in press) have employed a molecular mechanics-based program in an effort to model conformational averaging of NMR data. Conformations were calculated using a combination of molecular potentials and NMR data for the oligosaccharide moiety of an erythrocyte glycolipid composed of three neutral sugars and an amino sugar. The lowest energy conformer closely resembled a structure proposed earlier. However, fits to data could be improved when two equilibrating conformers were considered. Thus, it may be possible to determine solution conformations of the complex carbohydrates, even in nonrigid cases, using a combination of calculations and constraints imposed from experimental NMR data.

Despite the questions raised about the interpretation of NMR results and the value of potential energy minimizations, some important information has been collected about interactions of carbohydrate antigens with antibodies (Lemieux et al., 1985), oligosaccharides with lectins such as concanavalin A (Sekharudu et al., 1986), and oligosaccharides with glycosyltransferase enzymes (Carver and Cumming, 1987). Further refinements will depend upon the development of an agreed-on set of potential en ergy functions, which can be used with experimentally determined NOE-derived three-dimensional structures to evaluate whether a given molecule is distributed among several low energy conformations or occupies a particular subset of them. Tvaroska and Perez (1986) have recently compared several conformational energy calculations and proposed a general strategy for oligosaccharides.

Computer time and access to appropriate parallel processing array processors are important considerations in determining the level of support of research in this area at the present time. The availability of machines to calculate interatomic distances and van der Waals contributions extremely fast is a question that should be addressed by funding agencies. Interestingly, the several supercomputers currently operating on campuses have not been used to their capacity; perhaps efforts should be directed by appropriate advisory groups at these centers toward developing necessary software in these computers and establishing a policy that would direct a portion of their time for computer modeling of three-dimensional structures.


Structures that consist of more than one macromolecule interact as a unit in biological phenomena such as catalysis by many enzymes, binding at a cell surface, signal transduction across cell membranes, and other biological phenomena. Any enzyme that consists of more than one subunit should be thought of as a supramolecular structure. When large numbers of subunits are involved, and perhaps carry out more than one function, special consideration may have to be given to their relative spatial orientations. Examples are the replication of DNA by DNA polymerases, where complexes containing 10 or 12 proteins (called primosomes) are required to initiate replication. Ribosomes are even more complex, requiring at least 75 proteins to translate messenger RNA. Surfaces that consist of more than one macromolecule often behave as a functional unit. For example, the uptake of cholesterol by many cells requires the interaction of a specific cell surface receptor with a polypeptide surface of a complex supramolecular structure called low density lipoprotein (LDL), which consists of protein, cholesterol, phospholipids, and triacylglycerols. Alteration of the LDL protein by acetylation of a Lys residue blocks the binding of LDL to its receptor and uptake of cholesterol by the cell. Several hormones, including norepinephrine and epidermal growth factor (EGF), and other signals such as light (with rhodopsin) induce protein phosphorylation. EGF stimulates the growth of normal fibroblasts by binding to a specific transmembrane protein receptor on the cell surface. The hormone signal in this case is transduced by self-phosphorylation of the receptor on the intracellular side after the hormone binds, followed by other kinase-catalyzed phos phorylations of proteins, internalization of the EGF-EGF receptor complex, and a complex set of consequences in the nucleus and elsewhere in preparation for cell division. Bremer et al. (1986) recently found that GM3 ganglioside inhibits this process in an allosteric fashion by preventing the self-phosphorylation of EGF receptor after EGF binding. To accomplish this, GM3 in the outer half of the cell membrane must interact with a domain of the polypeptide chain of EGF receptor, probably causing a conformational change that prevents phosphorylation. A similar situation involving a lipid membrane is found with a mitochondrial enzyme, beta-hydroxybutyric dehydrogenase, which is catalytically active only when incorporated into a lipid bilayer composed of certain phospholipids. Computer-assisted mathematical modeling of such supramolecular structures will be necessary to gain a deeper understanding of the organization of biological materials for complex functions.



Actually, it is more appropriate to refer to “one cistron-one polypep tide”. This is no longer strictly accurate either, as more than one gene may contribute to the primary structure of a protein, i.e., immunoglobulins.

Copyright © National Academy of Sciences.
Bookshelf ID: NBK218559


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (3.8M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...