NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Griffiths AJF, Miller JH, Suzuki DT, et al. An Introduction to Genetic Analysis. 7th edition. New York: W. H. Freeman; 2000.

  • By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.
Cover of An Introduction to Genetic Analysis

An Introduction to Genetic Analysis. 7th edition.

Show details

Genetic code

If genes are segments of DNA and if DNA is just a string of nucleotide pairs, then how does the sequence of nucleotide pairs dictate the sequence of amino acids in proteins? The analogy to a code springs to mind at once. The cracking of the genetic code is the story told in this section. The experimentation was sophisticated and swift, and it did not take long for the code to be deciphered once its existence was strongly indicated.

Simple logic tells us that, if nucleotide pairs are the “letters” in a code, then a combination of letters can form “words” representing different amino acids. We must ask how the code is read. Is it overlapping or nonoverlapping? Then we must ask how many letters in the mRNA make up a word, or codon, and which specific codon or codons represent each specific amino acid.

Overlapping versus nonoverlapping codes

Figure 10-24 shows the difference between an overlapping and a nonoverlapping code. In the example, a three-letter, or triplet, code is shown. For the nonoverlapping code, consecutive amino acids are specified by consecutive code words (codons), as shown at the bottom of Figure 10-24. For an overlapping code, consecutive amino acids are encoded in the mRNA by codons that share some consecutive bases; for example, the last two bases of one codon may also be the first two bases of the next codon. Overlapping codons are shown in the upper part of Figure 10-24. Thus, for the sequence AUUGCUCAG in a nonoverlapping code, the first three amino acids are encoded by the three triplets AUU, GCU, and CAG, respectively. However, in an overlapping code, the first three amino acids are encoded by the triplets AUU, UUG, and UGC if the overlap is two bases, as shown in Figure 10-24.

Figure 10-24. The difference between an overlapping and a nonoverlapping code.

Figure 10-24

The difference between an overlapping and a nonoverlapping code. The case illustrated is for a code with three letters (a triplet code). An overlapping code uses codons that employ some of the same nucleotides as those of other codons for the translation (more...)

By 1961, it was already clear that the genetic code was nonoverlapping. The analysis of mutationally altered proteins, in particular, the nitrous acid–generated mutants of tobacco mosaic virus, showed that only a single amino acid changes at one time in one region of the protein. This result is predicted by a nonoverlapping code. As you can see from Figure 10-24, an overlapping code predicts that a single base change will alter as many as three amino acids at adjacent positions in the protein.

It should be noted that, although the use of an overlapping code was ruled out by the analysis of single proteins, nothing precluded the use of alternative reading frames to encode amino acids in two different proteins. In the example here, one protein might be encoded by the series of codons that reads AUU, GCU, CAG, CUU, and so forth. A second protein might be encoded by codons that are shifted over by one base and therefore read UUG, CUC, AGC, UUG, and so forth. This is an example of storing the information encoding two different proteins in two different reading frames, while still using a genetic code that is read in a nonoverlapping manner during the translation of a specific protein. Some examples of such shifts in reading frame have been found.

Number of letters in the code

In reading an mRNA molecule from one particular end, only one of four different bases, A, U, G, or C, can be found at each position. Thus, if the words were one letter long, only four words would be possible. This vocabulary cannot be the genetic code, because we must have a word for each of the 20 amino acids commonly found in cellular proteins. If the words were two letters long, then 42 = 16 words would be possible; for example, AU, CU, or CC. This vocabulary is still not large enough.

If the words are three letters long, then 43 = 64 words are possible; for example, AUU, GCG, or UGC. This vocabulary provides more than enough words to describe the amino acids. We can conclude that the code word must consist of at least three nucleotide pairs. However, if all words are “triplets,” then we have a considerable excess of possible words over the 20 needed to name the common amino acids.

Use of suppressors to demonstrate a triplet code

Convincing proof that a codon is, in fact, three letters long (and no more than three) came from beautiful genetic experiments first reported in 1961 by Francis Crick, Sidney Brenner, and their co-workers, who used mutants in the rII locus of T4 phage. Mutations causing the rII phenotype (see Chapter 9) were induced by using a chemical called proflavin, which was thought to act by the addition or deletion of single nucleotide pairs in DNA. (This assumption is based on experimental evidence not presented here.) The following examples illustrate the action of proflavin on double-stranded DNA.

Image ch10e4.jpg

Then, starting with one particular proflavin-induced mutation called FCO, Crick and his colleagues found “reversions” (reversals of the mutation) that were detected by their wild-type plaques on E. coli strain K(λ). Genetic analysis of these plaques revealed that the “revertants” were not identical true wild types, thereby suggesting that the back mutation was not an exact reversal of the original forward mutation. In fact, the reversion was found to be caused by the presence of a second mutation at a different site from—but in the same gene as—that of FCO; this second mutation “suppressed” mutant expression of the original FCO. Recall from Chapter 4 that a suppressor mutation counteracts or suppresses the effects of another mutation.

The suppressor mutation could be separated from the original forward mutation by recombination, and, as we have seen, when this was done, the suppressor was shown to be an rII mutation itself (Figure 10-25).

Figure 10-25. The suppressor of an initial rII mutation is shown to be an rII mutation itself after separation by crossing over.

Figure 10-25

The suppressor of an initial rII mutation is shown to be an rII mutation itself after separation by crossing over. The original mutant, FCO, was induced by proflavin. Later, when the FCO strain was treated with proflavin again, a revertant was found, (more...)

How can we explain these results? If we assume that reading is polarized—that is, if the gene is read from one end only—then the original proflavin-induced addition or deletion could be mutant because it interrupts a normal reading mechanism that establishes the group of bases to be read as words. For example, if each three bases on the resulting mRNA make a word, then the “reading frame” might be established by taking the first three bases from the end as the first word, the next three as the second word, and so forth. In that case, a proflavin-induced addition or deletion of a single pair on the DNA would shift the reading frame on the mRNA from that corresponding point on, causing all following words to be misread. Such a frameshift mutation could reduce most of the genetic message to gibberish. However, the proper reading frame could be restored by a compensatory insertion or deletion somewhere else, leaving only a short stretch of gibberish between the two. Consider the following example in which three-letter English words are used to represent the codons:

Image ch10e5.jpg

The insertion suppresses the effect of the deletion by restoring most of the sense of the sentence. By itself, however, the insertion also disrupts the sentence:

Image ch10e6.jpg

If we assume that the FCO mutant is caused by an addition, then the second (suppressor) mutant would have to be a deletion because, as we have seen, this would restore the reading frame of the resulting message (a second insertion would not correct the frame). In the following diagrams, we use a hypothetical nucleotide chain to represent RNA for simplicity. We also assume that the code words are three letters long and are read in one direction (left to right in our diagrams).


Wild-type message

Image ch10e7.jpg


rIIa message: distal words changed (x) by frameshift mutation (words marked [check] are unaffected)

Image ch10e8.jpg


rIIa rIIb message: few words wrong, but reading frame restored for later words

Image ch10e9.jpg

The few wrong words in the suppressed genotype could account for the fact that the “revertants” (suppressed phenotypes) that Crick and his associates recovered did not look exactly like the true wild types phenotypically.

We have assumed here that the original frameshift mutation was an addition, but the explanation works just as well if we assume that the original FCO mutation is a deletion and the suppressor is an addition. If the FCO is defined as plus, then suppressor mutations are automatically minus. Experiments have confirmed that a plus cannot suppress a plus and a minus cannot suppress a minus. In other words, two mutations of the same sign never act as suppressors of each other. However, very interestingly, combinations of three pluses or three minuses have been shown to act together to restore a wild-type phenotype.

This observation provided the first experimental confirmation that a word in the genetic code consists of three successive nucleotide pairs, or a triplet. The reason is that three additions or three deletions within a gene automatically restore the reading frame in the mRNA if the words are triplets. For example,

Image ch10e10.jpg

Proof that the genetic deductions about proflavin were correct came from an analysis of proflavin-induced mutations in a gene with a protein product that could be analyzed. George Streisinger worked with the gene that controls the enzyme lysozyme, which has a known amino acid sequence. He induced a mutation in the gene with proflavin and selected for proflavin-induced revertants, which were shown genetically to be double mutants (with mutations of opposite sign). When the protein of the double mutant was analyzed, a stretch of different amino acids lay between two wild-type ends, just as predicted:

Image ch10e11.jpg

Degeneracy of the genetic code

Crick’s work also suggested that the genetic code is degenerate. That expression is not a moral indictment. It simply means that each of the 64 triplets must have some meaning within the code; so at least some amino acids must be specified by two or more different triplets. If only 20 triplets are used (with the other 44 being nonsense, in that they do not code for any amino acid), then most frameshift mutations can be expected to produce nonsense words, which presumably stops the protein-building process. If this were the case, then the suppression of frameshift mutations would rarely, if ever, work. However, if all triplets specified some amino acid, then the changed words would simply result in the insertion of incorrect amino acids into the protein. Thus, Crick reasoned that many or all amino acids must have several different names in the base-pair code; this hypothesis was later confirmed biochemically.


The discussion up to this point demonstrates that


The genetic code is nonoverlapping.


Three bases encode an amino acid. These triplets are termed codons.


The code is read from a fixed starting point and continues to the end of the coding sequence. We know this because a single frameshift mutation anywhere in the coding sequence alters the codon alignment for the rest of the sequence.


The code is degenerate in that some amino acids are specified by more than one codon.

Cracking the code

The deciphering of the genetic code—determining the amino acid specified by each triplet—was one of the most exciting genetic breakthroughs of the past 50 years. Once the necessary experimental techniques became available, the genetic code was broken in a rush.

The first breakthrough was the discovery of how to make synthetic mRNA. If the nucleotides of RNA are mixed with a special enzyme (polynucleotide phosphorylase), a single-stranded RNA is formed in the reaction. No DNA is needed for this synthesis, and so the nucleotides are incorporated at random. The ability to synthesize mRNA offered the exciting prospect of creating specific mRNA sequences and then seeing which amino acids they would specify. The first synthetic messenger obtained, poly(U), was made by reacting only uracil nucleotides with the RNA-synthesizing enzyme, producing –UUUU–. In 1961, Marshall Nirenberg and Heinrich Matthaei mixed poly(U) with the proteinsynthesizing machinery of E. coli in vitro and observed the formation of a protein! The main excitement centered on the question of the amino acid sequence of this protein. It proved to be polyphenylalanine—a string of phenylalanine molecules attached to form a polypeptide. Thus, the triplet UUU must code for phenylalanine:

Image ch10e12.jpg

This type of analysis was extended by mixing nucleotides in a known fixed proportion when making synthetic mRNA. In one experiment, the nucleotides uracil and guanine were mixed in a ratio of 3:1. When nucleotides are incorporated at random into synthetic mRNA, the relative frequency at which each triplet will appear in the sequence can be calculated on the basis of the relative proportion of the various nucleotides present (Table 10-3). Note that, in Table 10-3, UUU is used as the baseline frequency against which the other frequencies are measured in determining their respective ratios. For example, UUG, with a probability of p(UUG) = 9/64, would be expected only one-third as often as UUU, with its probability of p(UUU) = 27/64. Stated alternatively, p(UUG)/p(UUU) = 9/27 = 1/3 = 0.33, which is the ratio for UUG given in Table 10-3.

Table 10-3. Expected Frequencies of Various Codons in Synthetic mRNA Composed of 3/4 Uracil and 1/4 Guanine.

Table 10-3

Expected Frequencies of Various Codons in Synthetic mRNA Composed of 3/4 Uracil and 1/4 Guanine.

If these codons each encode a different amino acid (that is, are not redundant), we expect the amino acids generated by this particular mix of guanine and uracil to be in ratios similar to those of the various codons. Although there is some redundancy among these codons, the ratios of the amino acids actually obtained from this mix of bases (Table 10-4) are indeed quite similar to the ratios seen for the codon frequencies in Table 10-3. (In Table 10-4, phenylalanine is used as the baseline in determining ratios.)

Table 10-4. Observed Frequencies of Various Amino Acids in Protein Translated from mRNA Composed of 3/4 Uracil and 1/4 Guanine.

Table 10-4

Observed Frequencies of Various Amino Acids in Protein Translated from mRNA Composed of 3/4 Uracil and 1/4 Guanine.

From this evidence, we can deduce that codons consisting of one guanine and two uracils (G + 2 U) code for valine, leucine, and cysteine, although we cannot distinguish the specific sequence for each of these amino acids. Similarly, one uracil and two guanines (U + 2 G) must code for tryptophan, glycine, and perhaps one other. It looks as though the Watson-Crick model is correct in predicting the importance of the precise sequence (not just the ratios of bases). Many provisional assignments (such as those just outlined for G and U) were soon obtained, primarily by groups working with Nirenberg or with Severo Ochoa.

Before we consider other code words, we will examine tRNA molecules, which further explain the link between the mRNA codon and amino acid recognition.

tRNA recognition of the codon

Is it the tRNA or the amino acid itself that recognizes the mRNA that encodes a specific amino acid? A very convincing experiment answered this question. In the experiment, an aminoacyl-tRNA (aa-tRNA), cysteinyl-tRNA (tRNACys, the tRNA specific for cysteine) “charged” with cysteine was treated with nickel hydride, which converted the cysteine (while still bound to tRNACys) into another amino acid, alanine, without affecting the tRNA:

Image ch10e13.jpg

Protein synthesized with this hybrid species had alanine wherever we would expect cysteine. Thus, the experiment demonstrated that the amino acids are “illiterate”; they are inserted at the proper position because the tRNA “adapters” recognize the mRNA codons and insert their attached amino acids appropriately. We would expect, then, to find some site on the tRNA that recognizes the mRNA codon by complementary base pairing.

Figure 10-26a shows several functional sites of the tRNA molecule. The site that recognizes an mRNA codon is called the anticodon; its bases are complementary and antiparallel to the bases of the codon. Another operationally identifiable site is the amino acid attachment site. The other arms probably assist in binding the tRNA to the ribosome. Figure 10-26b shows a specific tRNA (yeast alanine tRNA). The “flattened” cloverleafs shown in these diagrams are not the normal conformation of tRNA molecules; tRNA normally exists as an L-shaped folded cloverleaf, as shown in Figure 10-26c. These diagrams are supported by very sophisticated chemical analysis of tRNA nucleotide sequences and by X-ray crystallographic data on the overall shape of the molecule. Although tRNA molecules have many structural similarities, each has a unique three-dimensional shape that allows recognition by the correct synthetase, which catalyzes the joining of a tRNA with its specific amino acid to form an aminoacyl-tRNA. (Synthetases will be considered in this chapter under “Protein Synthesis.”) The specificity of charging the tRNAs is crucial to the integrity of protein synthesis.

Figure 10-26. The structure of transfer RNA.

Figure 10-26

The structure of transfer RNA. (a) The functional areas of a generalized tRNA molecule. (b) The specific sequence of yeast alanine tRNA. Arrows indicate several kinds of rare modified bases. (c) Diagram of the actual three-dimensional structure of yeast phenylalanine (more...)

Where does tRNA come from? If radioactive tRNA is put into a cell nucleus in which the DNA has been partly denatured by heating, the radioactivity appears (by autoradiography) in localized regions of the chromosomes. These regions probably indicate the location of genes that specify tRNA; they are regions of DNA that produce tRNA rather than mRNA, which produces a protein. The labeled tRNA hybridizes to these sites because of the complementarity of base sequences between the tRNA and its parent gene. A similar situation holds for rRNA. Thus, we see that even the one-gene–one-polypeptide idea is not completely valid. Some genes do not code for protein; rather, they specify RNA components of the translational apparatus.


Some genes encode proteins; other genes specify RNA (for example, tRNA or rRNA) as their final product.

How does tRNA get its fancy shape? It probably folds up spontaneously into a conformation that produces maximal stability. Transfer RNA contains many “odd” or modified bases (such as pseudouracil, ψ) in its nucleotides; these bases play a direct role in folding and have been implicated in other tRNA functions. You may have noticed some unusual base pairing within the loops of the tRNA in Figure 10-26b; G is hydrogen bonded to U (instead of C). This apparent mismatching is considered next.

The complete code

Specific code words were finally deciphered through two kinds of experiments. The first required making “mini mRNAs,” each only three nucleotides in length. These mini mRNAs are too short to promote translation into protein, but they do stimulate the binding of aminoacyl-tRNAs to ribosomes in a kind of abortive attempt at translation. It is possible to make a specific mini mRNA and determine which aminoacyl-tRNA that it will bind to ribosomes. For example, the G + 2 U problem described earlier can be resolved by using the following mini mRNAs:

Image ch10e14.jpg

Analogous mini RNAs provided 64 possible codons.

The second kind of experiment that was useful in cracking the genetic code required the use of repeating copolymers. For instance, the copolymer designated (AGA) n , which is a long sequence of AGAAGAAGAAGAAGA, was used to stimulate polypeptide synthesis in vitro. From the sequence of the resulting polypeptides and the possible triplets that could reside in the respective RNA copolymer, many code words could be verified. (This kind of experiment is detailed in Problem 10 at the end of this chapter. In solving it, you can put yourself in the place of H. Gobind Khorana, who received a Nobel Prize for directing the experiments.)

Figure 10-27 gives the genetic code dictionary of 64 words. Inspect this dictionary carefully, and ponder the miracle of molecular genetics. Such an inspection should reveal several points that require further explanation.

Figure 10-27. The genetic code.

Figure 10-27

The genetic code.

Multiple codons for a single amino acid

From the discussion of degeneracy, we know that the number of codons for a single amino acid varies, ranging from one (tryptophan = UGG) to as many as six (serine = UCU or UCC or UCA or UCG or AGU or AGC). Why? The answer is complex but not difficult; it can be divided into two parts:


Certain amino acids can be brought to the ribosome by several alternative tRNA types (species) having different anticodons, whereas certain other amino acids are brought to the ribosome by only one tRNA.


Certain tRNA species can bring their specific amino acids in response to several codons, not just one, through a loose kind of base pairing at one end of the codon and anticodon. This sloppy pairing is called wobble.


The degree of degeneracy for a given amino acid is determined by the number of codons for that amino acid that have only one tRNA each plus the number of codons for amino acids that share a tRNA through wobble.

We had better consider wobble first, and it will lead us into a discussion of the various species of tRNA. Wobble is caused by the third nucleotide of an anticodon (at the 5′ end) that is not quite aligned (Figure 10-28). This out-of-line nucleotide can sometimes form hydrogen bonds not only with its normal complementary nucleotide in the third position of the codon, but also with a different nucleotide in that position. Crick established certain “wobble rules” that dictate which nucleotides can and cannot form new hydrogen-bonded associations through wobble (Table 10-5). In Table 10-5, I (inosine) is one of the rare bases found in tRNA, often in the anticodon.

Figure 10-28. In the third site (5′ end) of the anticodon, G can take either of two wobble positions, thus being able to pair with either U or C.

Figure 10-28

In the third site (5′ end) of the anticodon, G can take either of two wobble positions, thus being able to pair with either U or C. This ability means that a single tRNA species carrying an amino acid (in this case, serine) can recognize two codons—UCU (more...)

Table 10-5. Codon-Anticodon Pairings Allowed by the Wobble Rules.

Table 10-5

Codon-Anticodon Pairings Allowed by the Wobble Rules.

Figure 10-28 shows the possible codons that one tRNA serine species can recognize. As the wobble rules indicate, G can pair with U or with C. Table 10-6 lists all the codons for serine and shows how different tRNAs can service these codons. Serine affords a good example of the effects of wobble on the genetic code.

Table 10-6. Different tRNAs That Can Service Codons for Serine.

Table 10-6

Different tRNAs That Can Service Codons for Serine.

Sometimes there can be an additional tRNA species that we represent as tRNASer4; it has an anticodon identical with any of the three anticodons shown in Table 10-6, but it differs in its nucleotide sequence elsewhere in the tRNA molecule. These four tRNAs are called isoaccepting tRNAsbecause they accept the same amino acid, but they are probably all transcribed from different tRNA genes.

Stop codons

The second point that you may have noticed in Figure 10-27 is that some codons do not specify an amino acid at all. These codons are labeled as stop or termination codons. They can be regarded as being similar to periods or commas punctuating the message encoded in the DNA.

One of the first indications of the existence of stop codons came in 1965 from Brenner’s work with the T4 phage. Brenner analyzed certain mutations (m 1m 6) in a single gene that controls the head protein of the phage. These mutants had two things in common. First, the head protein of each mutant was a shorter polypeptide chain than that of the wild type. Second, the presence of a suppressor mutation (su) in the host chromosome would cause the phage to develop a head protein of normal (wild-type) chain length despite the presence of the m mutation (Figure 10-29).

Figure 10-29. Polypeptide chain lengths of phage T4 head protein in wild type (top) and various amber mutants (m).

Figure 10-29

Polypeptide chain lengths of phage T4 head protein in wild type (top) and various amber mutants (m). An amber suppressor (su) leads to phenotypic development of the wild-type chain.

Brenner examined the ends of the shortened proteins and compared them with wild-type protein, recording for each mutant the next amino acid that would have been inserted to continue the wild-type chain. These amino acids for the six mutations were glutamine, lysine, glutamic acid, tyrosine, tryptophan, and serine. There is no immediately obvious pattern to these results, but Brenner brilliantly deduced that certain codons for each of these amino acids are similar in that each of them can mutate to the codon UAG by a single change in a DNA nucleotide pair. He therefore postulated that UAG is a stop (termination) codon—a signal to the translation mechanism that the protein is now complete.

UAG was the first stop codon deciphered; it is called the amber codon. Mutants that are defective owing to the presence of an abnormal amber codon are called amber mutants, and their suppressors are amber suppressors. UGA, the opal codon, and UAA, the ochre codon, also are stop codons and also have suppressors. Stop codons are often called nonsense codons because they designate no amino acid. Not surprisingly, stop codons do not act as mini mRNAs in binding aa-tRNA to ribosomes in vitro. We shall consider stop codons and their suppressors further after we have dealt with the process of protein synthesis.

By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.

Copyright © 2000, W. H. Freeman and Company.
Bookshelf ID: NBK21950


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...