NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Griffiths AJF, Miller JH, Suzuki DT, et al. An Introduction to Genetic Analysis. 7th edition. New York: W. H. Freeman; 2000.

  • By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.
Cover of An Introduction to Genetic Analysis

An Introduction to Genetic Analysis. 7th edition.

Show details

Gene-protein relations

The one-gene–one-enzyme hypothesis was an impressive step forward in our understanding of gene function, but just how do genes control the functioning of enzymes? Virtually all enzymes are proteins, and thus we must review the basic facts of protein structure to follow the next step in the study of gene function.

Protein structure

In simple terms, a protein is a macromolecule composed of amino acids attached end to end in a linear string. The general formula for an amino acid is H2N−CHR−COOH, in which the side chain, or R (reactive) group, can be anything from a hydrogen atom (as in the amino acid glycine) to a complex ring (as in the amino acid tryptophan). There are 20 common amino acids in living organisms (Table 9-3), each having a different R group. Amino acids are linked together in proteins by covalent (chemical) bonds called peptide bonds. A peptide bond is formed through a condensation reaction that includes the removal of a water molecule (Figure 9-3).

Table 9-3. The 20 Amino Acids Common in Living Organisms.

Table 9-3

The 20 Amino Acids Common in Living Organisms.

Figure 9-3. The peptide bond.

Figure 9-3

The peptide bond. (a) A polypeptide is formed by the removal of water between amino acids to form peptide bonds. Each aa indicates an amino acid. R1, R2, and R3 represent R groups (side chains) that differentiate the amino acids. R can be anything from (more...)

Several amino acids linked together by peptide bonds form a molecule called a polypeptide; the proteins found in living organisms are large polypeptides. For instance, the α chain of human hemoglobin contains 141 amino acids, and some proteins consist of more than 1000 amino acids.

The properties of the amino acid side chains are responsible for the structure and function of each protein. These side chains vary in many chemical properties. A key property is the hydrophobic, or water repelling, character of each amino acid. Hydrophobic amino acids tend to avoid contact with water and are often turned toward the inside of the protein, whereas the amino acids that are charged or that can form hydrogen bonds with water are excluded from the interior of the protein and are turned toward the exterior or surface of the protein.

Proteins have a complex structure that is traditionally thought of as having four levels. The linear sequence of the amino acids in a polypeptide chain is called the primary structure of the protein. Figure 9-4 shows the linear sequence of tryptophan synthetase (an enzyme) and beef insulin (a hormonal protein).

Figure 9-4. Linear sequences of two proteins.

Figure 9-4

Linear sequences of two proteins. (a) The E. coli tryptophan synthetase A protein, 268 amino acids long. (b) Bovine insulin protein. Note that the amino acid cysteine can form unique “sulfur bridges,” because it contains sulfur.

The secondary structure of a protein refers to the interrelations of amino acids that are close together in the linear sequence. This spatial arrangement often results from the fact that polypeptides can bend into regularly repeating (periodic) structures, created by hydrogen bonds between the CO and NH groups of different residues. Two of the basic periodic structures are the α helix (Figure 9-5) and the β pleated sheet (Figure 9-6).

Figure 9-5. The α helix, a common basis of secondary protein structure.

Figure 9-5

The α helix, a common basis of secondary protein structure. Each R is a specific side chain on one amino acid. The black dots represent weak hydrogen bonds that bond the CO group of residue n to the NH group of residue n + 4, thereby (more...)

Figure 9-6. Two views of the antiparallel β pleated sheet, another common form of secondary protein structure.

Figure 9-6

Two views of the antiparallel β pleated sheet, another common form of secondary protein structure. Adjacent strands run in opposite directions. Hydrogen bonds between NH and CO groups of adjacent strands stabilize the structure. The side chains (more...)

A protein also has a three-dimensional architecture, termed the tertiary structure, which is created by electrostatic, hydrogen, and Van der Waals bonds that form between the various amino acid R groups, causing the protein chain to fold back on itself. In many cases, amino acids that are far apart in the linear sequence are brought close together in the tertiary structure. Often, two or more folded structures will bind together to form a quaternary structure; this structure is multimeric because it is composed of several separate polypeptide chains, or monomers.

Figure 9-7 depicts the four levels of protein structure. In Figure 9-7c, we can see the tertiary structure of myoglobin. Note how the α helix is folded back on itself to generate the three-dimensional shape of the protein. Figure 9-7d shows the combining of four subunits (two α chains and two β chains) to form the quaternary structure of hemoglobin. Figure 9-8 shows the structure of myoglobin in more detail. The combining of subunits to form a multimeric enzyme can be seen directly in the electron microscope in some cases (Figure 9-9).

Figure 9-7. Different levels of protein structure.

Figure 9-7

Different levels of protein structure. (a) Primary structure. (b) Secondary structure. The polypeptide shown in part a is drawn into an α helix by hydrogen bonds. (c) Tertiary structure: the three-dimensional structure of myoglobin. (d) Quaternary (more...)

Figure 9-8. Folded tertiary structure of myoglobin, an oxygen-storage protein.

Figure 9-8

Folded tertiary structure of myoglobin, an oxygen-storage protein. Each dot represents an amino acid. The heme group, a cofactor that facilitates the binding of oxygen, is shown in blue. (From L. Stryer, Biochemistry, 4th ed. Copyright © 1995 (more...)

Figure 9-9. Electron micrograph of the enzyme aspartate transcarbamylase.

Figure 9-9

Electron micrograph of the enzyme aspartate transcarbamylase. Each small “glob” is an enzyme molecule. Note the quaternary structure: the enzyme is composed of subunits. (Photograph from Jack D. Griffith.)

Many proteins are compact structures; such proteins are called globular proteins. Enzymes and antibodies are among the important globular proteins. Other, unfolded proteins, called fibrous proteins, are important components of such structures as hair and muscle.

MESSAGE

The linear sequence of a protein folds up to yield a unique three-dimensional configuration. This configuration creates specific sites to which substrates bind and at which catalytic reactions take place. The three-dimensional structure of a protein, which is crucial for its function, is determined solely by the primary structure (linear sequence) of amino acids. Therefore, genes can control enzyme function by controlling the primary structure of proteins.

Protein motifs

Often, several elements of secondary structure combine to produce a pattern, or motif, that is found in numerous other proteins. We can recognize motifs sometimes by their amino acid sequence pattern and other times by observing the three-dimensional structure. Figure 9-10 shows two examples. The helix-loop-helix motif is found in calcium binding proteins, and a variant of it is found in regulatory proteins that bind DNA. The zinc-binding motif, also found in DNA binding proteins, is termed the zinc finger, because of the way that the residues protrude outward, like a finger.

Figure 9-10. Secondary structure motifs.

Figure 9-10

Secondary structure motifs. (a) Helix-loop-helix motif is a characteristic feature of many calcium-binding proteins, as shown here. (b) Zinc-finger motif, which is present in many proteins that bind nucleic acids. A Zn2+ ion is held between a pair of beta (more...)

Determining protein sequence

If we purify a particular protein, we find that we can specify a particular ratio of the various amino acids that make up that specific protein. But the protein is not formed by a random hookup of fixed amounts of the various amino acids; each protein has a unique, characteristic sequence. For a small polypeptide, the amino acid sequence can be determined by clipping off one amino acid at a time and identifying it. However, large polypeptides cannot be readily “sequenced” in this way.

Frederick Sanger worked out a brilliant method for deducing the sequence of large polypeptides. There are several different proteolytic enzymes—enzymes that can break peptide bonds only between specific amino acids in proteins. Proteolytic enzymes can break a large protein into a number of smaller fragments, which can then be separated according to their migration speeds in a solvent on chromatographic paper. Because different fragments will move at different speeds in various solvents, two-dimensional chromatography can be used to enhance the separation of the fragments (Figure 9-11). In this technique, a mixture of fragments is separated in one solvent; then the paper is turned 90° and another solvent is used. When the paper is stained, the polypeptides appear as spots in a characteristic chromatographic pattern called the fingerprint of the protein. Each of the spots can be cut out, and the polypeptide fragments can be washed from the paper. Because each spot contains only small polypeptides, their amino acid sequences can be easily determined.

Figure 9-11. Two-dimensional chromatographic fingerprinting of a polypeptide fragment mixture.

Figure 9-11

Two-dimensional chromatographic fingerprinting of a polypeptide fragment mixture. A protein is digested by a proteolytic enzyme into fragments that are only a few amino acids long. A piece of chromatographic filter paper is then spotted with this mixture (more...)

Using different proteolytic enzymes to cleave the protein at different points, we can repeat the experiment to obtain other sets of fragments. The fragments from the different treatments overlap, because the breaks are made in different places with each treatment. The problem of solving the overall sequence then becomes one of fitting together the small-fragment sequences—almost like solving a tricky jigsaw or crossword puzzle (Figure 9-12).

Figure 9-12. Alignment of polypeptide fragments to reconstruct an entire amino acid sequence.

Figure 9-12

Alignment of polypeptide fragments to reconstruct an entire amino acid sequence. Different proteolytic enzymes can be used on the same protein to form different fingerprints, as shown here. The amino acid sequence of each fragment can be determined rather (more...)

Using this elegant technique, Sanger confirmed that the sequence of amino acids (as well as the amounts of the various amino acids) is specific to a particular protein. In other words, the amino acid sequence is what makes insulin insulin.

Relation between gene mutations and altered proteins

We now know that the change of just one amino acid is sometimes enough to alter protein function. This was first shown in 1957 by Vernon Ingram, who studied the globular protein hemoglobin—the molecule that transports oxygen in red blood cells. As shown in Figure 9-7d, hemoglobin is made up of four polypeptide chains: two identical α chains, each containing 141 amino acids, and two identical β chains, each containing 146 amino acids.

Ingram compared hemoglobin A (HbA), the hemoglobin from normal adults, with hemoglobin S (HbS), the protein from people homozygous for the mutant gene that causes sickle-cell anemia, the disease in which red blood cells take on a sickle-cell shape (see Figure 4-2). Using Sanger’s technique, Ingram found that the fingerprint of HbS differs from that of HbA in only one spot. Sequencing that spot from the two kinds of hemoglobin, Ingram found that only one amino acid in the fragment differs in the two kinds. Apparently, of all the amino acids known to make up a hemoglobin molecule, a substitution of valine for glutamic acid at just one point, position 6 in the β chain, is all that is needed to produce the defective hemoglobin (Figure 9-13). Unless patients with HbS receive medical attention, this single error in one amino acid in one protein will hasten their death. Figure 9-14 shows how this gene mutation ultimately leads to the pattern of sickle-cell disease.

Figure 9-13. The difference at the molecular level between normalcy and sickle-cell disease.

Figure 9-13

The difference at the molecular level between normalcy and sickle-cell disease. Shown are only the first seven amino acids; all the rest not shown are identical. (From Anthony Cerami and Charles M. Peterson, “Cyanate and Sickle-Cell Disease.” (more...)

Figure 9-14. The compounded consequences of one amino acid substitution in hemoglobin to produce sickle-cell anemia.

Figure 9-14

The compounded consequences of one amino acid substitution in hemoglobin to produce sickle-cell anemia.

Notice what Ingram accomplished. A gene mutation that had been well established through genetic studies was connected with an altered amino acid sequence in a protein. Subsequent studies identified numerous changes in hemoglobin, and each one is the consequence of a single amino acid difference. (Figure 9-15 shows a few examples.) We can conclude that one mutation in a gene corresponds to a change of one amino acid in the sequence of a protein.

Figure 9-15. Some single amino acid substitutions found in human hemoglobin.

Figure 9-15

Some single amino acid substitutions found in human hemoglobin. Amino acids are normal at all residue positions except those indicated. Each type of change causes disease. (Names indicate areas in which cases were first identified.)

MESSAGE

Genes determine the specific primary sequences of amino acids in specific proteins.

Colinearity of gene and protein

Once the structure of DNA had been determined by Watson and Crick, it became apparent that the structure of proteins must be encoded in the linear sequence of nucleotides in the DNA. (We shall see in Chapter 10 how this genetic code was deciphered.) After Ingram’s demonstration that one mutation alters one amino acid in a protein, a relation was sought between the linear sequence of mutant sites in a gene and the linear sequence of amino acids in a protein. (It is possible to map mutational sites within a gene by high-resolution recombination analysis, as we saw in the rII system, described in Chapter 7 and expanded later in this chapter.)

Charles Yanofsky probed the relation between altered genes and altered proteins by studying the enzyme tryptophan synthetase in E. coli. This enzyme catalyzes the conversion of indole glycerol phosphate into tryptophan. Two genes, trpA and trpB, control the enzyme. Each gene controls a separate polypeptide; after the A and B polypeptides are produced, they combine to form the active enzyme (a multimeric protein). Yanofsky analyzed mutations in the trpA gene that resulted in alterations of the tryptophan synthetase A subunit. He produced a detailed map of the mutations, and he then determined the amino acid sequence of each respective altered tryptophan synthetase. His results were similar to Ingram’s for hemoglobin: each mutant had a defective polypeptide associated with a specific amino acid substitution at a specific point. However, Yanofsky was able to show an exciting correlation that Ingram was not able to observe, owing to the limitations of his system. Yanofsky found an exact match between the sequence of the mutational sites in the gene map of the trpA gene and the location of the corresponding altered amino acids in the A polypeptide chain. The farther apart two mutational sites were in map units, the more amino acids there were between the corresponding substitutions in the polypeptide (Figure 9-16). Thus, Yanofsky demonstrated colinearity—the correspondence between the linear sequence of the gene and that of the polypeptide. Figure 9-17 shows the complete set of data.

Figure 9-16. Simplified representation of the colinearity of gene mutations.

Figure 9-16

Simplified representation of the colinearity of gene mutations. The genetic map of point mutations (determined by recombinational analysis) corresponds linearly to the changed amino acids in the different mutants (determined by fingerprint analysis). (more...)

Figure 9-17. Actual colinearity shown in the A protein of tryptophan synthetase from E.

Figure 9-17

Actual colinearity shown in the A protein of tryptophan synthetase from E. coli. There is a linear correlation between the mutational sites and the altered amino acid residues. (After C. Yanofsky, “Gene Structure and Protein Structure.” (more...)

MESSAGE

The linear sequence of nucleotides in a gene determines the linear sequence of amino acids in a protein.

X-ray determination of threedimensional structure of proteins

X-ray crystallography is a powerful method for determining the three-dimensional structure of a protein in atomic detail. John Kendrew first applied this method to a molecule as complex as a protein to elucidate the structure of myoglobin in 1957, and Max Perutz succeeded in unraveling the complexities of hemoglobin several years later. Now, the structures of hundreds of proteins are known. In this technique, crystals of the protein are obtained in a concentrated salt solution of the pure protein. Then, a narrow beam of X rays is passed through the crystal. The repeating pattern of atoms in the protein complex scatters (or diffracts) the X-ray beams, giving a pattern of spots on X-ray film, as depicted in Figure 9-18a (left); see also Figure 9-18b. Information about the electron density in different parts of the protein is contained in the position and intensity of each spot.

Figure 9-18. The use of X rays to determine the structure of enzymes.

Figure 9-18

The use of X rays to determine the structure of enzymes. (a) A beam of X rays is passed through a crystal, and the pattern of spots on an X-ray film is used to generate electron-density maps and contour maps. (b) The X-ray diffraction pattern given by (more...)

Sophisticated mathematical analysis is used to generate electron-density maps (Figure 9-17a; right), which are in turn used to derive contour maps of the protein. Ultimately, detailed models of a protein can be built, as shown in the space-filling model of chymotrypsin (Figure 9-17c). Amazingly, this model stems from the pattern of spots seen in Figure 9-17b.

Enzyme function

How can a single amino acid substitution, such as that in sickle-cell hemoglobin (Figure 9-13), have such an enormous effect on protein function and the phenotype of an organism? Take enzymes, for example. Enzymes are known to do their job of catalysis by physically grappling with their substrate molecules, twisting or bending the molecules to make or break chemical bonds. Figure 9-19 shows the gastric digestion enzyme carboxypeptidase in its relaxed position and after grappling with its substrate molecule, glycyltyrosine. The substrate molecule fits into a notch in the enzyme structure; this notch is called the active site.

Figure 9-19. The active site of the digestive enzyme carboxypeptidase.

Figure 9-19

The active site of the digestive enzyme carboxypeptidase. (a) The enzyme without substrate. (b) The enzyme with its substrate (gold) in position. Three crucial amino acids (red) have changed positions to move closer to the substrate. Carboxypeptidase (more...)

Figure 9-20 diagrams the general concept. Note that there are two basic types of reactions performed by enzymes: (1) the breakdown of a substrate into simpler products and (2) the synthesis of a complex product from one or more simpler substrates.

Figure 9-20. Schematic representation of the action of a hypothetical enzyme in putting two substrate molecules together.

Figure 9-20

Schematic representation of the action of a hypothetical enzyme in putting two substrate molecules together. (a) In the lock-and-key mechanism the substrates have a complementary fit to the enzyme’s active site. (b) In the induced-fit model, binding (more...)

Much of the globular structure of an enzyme is nonreactive material that simply supports the active site. So we might expect that amino acid substitutions throughout most of the structure would have little effect, whereas very specific amino acids would be required for the part of the enzyme molecule that gives the precise shape to the active site. Hence, the possibility arises that a functional enzyme does not always require a unique amino acid sequence for the entire polypeptide. This possibility has proved to be the case: in a number of systems, numerous positions in a polypeptide can be filled by several alternative amino acids, and enzyme function is retained. But, at certain other positions in the polypeptide, only the wild-type amino acid will preserve activity; in all likelihood, these amino acids form critical parts of the active sites. Some of these critical amino acids in carboxypeptidase are indicated in red in Figure 9-19.

MESSAGE

Protein architecture is the key to gene function. A gene mutation typically results in the substitution of a different amino acid into the polypeptide sequence of a protein. The new amino acid may have chemical properties that are incompatible with the proper protein architecture at that particular position; in such a case, the mutation will lead to a nonfunctional protein.

Genes and cellular metabolism: genetic diseases

When we think of enzyme activity in relation to cellular metabolism, we realize that the inactivation of one or more enzymes can have staggering consequences. Most of us have been amazed by the charts on laboratory walls showing the myriad interlocking, branched, and circular pathways along which the cell’s chemical intermediates are shunted like parts on an assembly line. Bonds are broken, molecules cleaved, molecules united, groups added or removed, and so forth. The key fact is that almost every step, represented by an arrow on the metabolic chart, is controlled (mediated) by an enzyme, and each of these enzymes is produced under the direction of a gene that specifies its function. Change one critical gene, and the entire assembly line can break down.

Humans provide some startling examples. The list in Table 9-4 gives some representative examples and suggests the magnitude of genetic involvement in human disease. Figure 9-21 shows a corner of the human metabolic map to illustrate how a set of diseases, some of them common and familiar to us, can stem from the blockage of adjacent steps in biosynthetic pathways.

Table 9-4. Representative Examples of Enzymopathies: Inherited Disorders in Which Altered Activity (Usually Deficiency) of a Specific Enzyme Has Been Demonstrated in Humans.

Table 9-4

Representative Examples of Enzymopathies: Inherited Disorders in Which Altered Activity (Usually Deficiency) of a Specific Enzyme Has Been Demonstrated in Humans.

Figure 9-21. One small part of the human metabolic map, showing the consequences of various specific enzyme failures.

Figure 9-21

One small part of the human metabolic map, showing the consequences of various specific enzyme failures. (Disease phenotypes are shown in colored boxes.) (After I. M. Lerner and W. J. Libby, Heredity, Evolution, and Society, 2d ed. Copyright © (more...)

Genetic observations explained by enzyme structure

Armed with an understanding of the gene-protein relation and how enzymes function, we can now reexamine some of the genetic findings presented in earlier chapters and look at them in regard to the biochemistry.

A good example can be found in temperaturesensitive alleles. Recall that some mutants appear to be wild type at normal temperatures but can be detected as mutants at high or low temperatures. We now know that such mutations result from the substitution of an amino acid that produces a protein that is functional at normal temperatures, called permissive temperatures, but is distorted and nonfunctional at high or low temperatures, called restrictive temperatures (Figure 9-22).

Figure 9-22. Schematic representation of protein conformational distortion, which is probably the basis for temperature sensitivity in certain mutants.

Figure 9-22

Schematic representation of protein conformational distortion, which is probably the basis for temperature sensitivity in certain mutants. An amino acid substitution that has no significant effect at normal (permissive) temperatures may cause significant (more...)

As we have seen, conditional mutations such as temperature-sensitive mutations can be very useful to geneticists. Stocks of the mutant culture can be easily maintained under permissive conditions, and the mutant phenotype can be studied intensively under restrictive conditions. Such mutants can be very handy in the genetic dissection of biological systems. For example, with a temperature-sensitive allele, we can shift to a restrictive temperature at various times in the course of development to determine the time at which a gene is active.

Image ch4f2

By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.

Copyright © 2000, W. H. Freeman and Company.
Bookshelf ID: NBK21811