NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Brown TA. Genomes. 2nd edition. Oxford: Wiley-Liss; 2002.

Cover of Genomes

Genomes. 2nd edition.

Show details

Chapter 14Mutation, Repair and Recombination

Learning outcomes

When you have read Chapter 14, you should be able to

  • Distinguish between the terms ‘mutation’ and ‘recombination’, and define the various terms that are used to identify different types of mutation
  • Describe, with specific examples, how mutations are caused by spontaneous errors in replication and by chemical and physical mutagens
  • Recount, with specific examples, the effects of mutations on genomes and organisms
  • Discuss the biological significance of hypermutation and programmed mutations
  • Distinguish between the various types of DNA repair mechanism, and give detailed descriptions of the molecular events occurring during each type of repair
  • Outline the link between DNA repair and human disease
  • Draw diagrams, with detailed annotation, illustrating the processes of homologous recombination, gene conversion, site-specific recombination, conservative and replicative transposition, and retrotransposition, and discuss the biological significance of each of these mechanisms

Genomes are dynamic entities that change over time as a result of the cumulative effects of small-scale sequence alterations caused by mutation and larger scale rearrangements arising from recombination. Mutation and recombination can both be defined as processes that result in changes to a genome, but they are unrelated and we must make a clear distinction between them:

  • A mutation (Section 14.1) is a change in the nucleotide sequence of a short region of a genome (Figure 14.1A). Many mutations are point mutations that replace one nucleotide with another; others involve insertion or deletion of one or a few nucleotides. Mutations result either from errors in DNA replication or from the damaging effects of mutagens, such as chemicals and radiation, which react with DNA and change the structures of individual nucleotides. All cells possess DNA-repair enzymes that attempt to minimize the number of mutations that occur (Section 14.2). These enzymes work in two ways. Some are pre-replicative and search the DNA for nucleotides with unusual structures, these being replaced before replication occurs; others are post-replicative and check newly synthesized DNA for errors, correcting any errors that they find (Figure 14.1B). A possible definition of mutation is therefore a deficiency in DNA repair.
  • Recombination (Section 14.3) results in a restructuring of part of a genome, for example by exchange of segments of homologous chromosomes during meiosis or by transposition of a mobile element from one position to another within a chromosome or between chromosomes (Figure 14.1C). Various other events that we have studied, including mating-type switching in yeast (see Figure 12.13) and construction of immunoglobulin genes (see Figure 12.15), are also the results of recombination. Recombination is a cellular process which, like other cellular processes involving DNA (e.g. transcription and replication), is carried out and regulated by enzymes and other proteins.

Figure 14.1. Mutation, repair and recombination.

Figure 14.1

Mutation, repair and recombination. (A) A mutation is a small-scale change in the nucleotide sequence of a DNA molecule. A point mutation is shown but there are several other types of mutation, as described in the text. (B) DNA repair corrects mutations (more...)

Both mutation and recombination can have dramatic effects on the cell in which they occur. A mutation in a key gene may cause the cell to die if the protein coded by the mutant gene is defective (Section 14.1.2), and some recombination events lead to defining changes in the biochemical capabilities of the cell, for example by determining the mating type of a yeast cell or the immunological properties of a mammalian B or T lymphocyte. Other mutation and recombination events have a less significant impact on the phenotype of the cell and many have none at all. As we will see in Chapter 15, all events that are not lethal have the potential to contribute to the evolution of the genome but for this to happen they must be inherited when the organism reproduces. With a single-celled organism such as a bacterium or yeast, all genome alterations that are not lethal or reversible are inherited by daughter cells and become permanent features of the lineage that descends from the original cell in which the alteration occurred. In a multicellular organism, only those events that occur in germ cells are relevant to genome evolution. Changes to the genomes of somatic cells are unimportant in an evolutionary sense, but they will have biological relevance if they result in a deleterious phenotype that affects the health of the organism.

Box Icon

Box 14.1

Terminology for describing point mutations. Point mutations are also called simple mutations or single-site mutations. They are sometimes described as substitution mutations but this risks confusion because to an evolutionary geneticist, ‘substitution’ (more...)

14.1. Mutations

With mutations, the issues that we have to consider are: how they arise; the effects they have on the genome and on the organism in which the genome resides; whether it is possible for a cell to increase its mutation rate and induce programmed mutations under certain circumstances; and how mutations are repaired.

14.1.1. The causes of mutations

Mutations arise in two ways:

  • Some mutations are spontaneous errors in replication that evade the proofreading function of the DNA polymerases that synthesize new polynucleotides at the replication fork (Section 13.2.2). These mutations are called mismatches because they are positions where the nucleotide that is inserted into the daughter polynucleotide does not match, by base-pairing, the nucleotide at the corresponding position in the template DNA (Figure 14.2A). If the mismatch is retained in the daughter double helix then one of the granddaughter molecules produced during the next round of DNA replication will carry a permanent double-stranded version of the mutation.
  • Other mutations arise because a mutagen has reacted with the parent DNA, causing a structural change that affects the base-pairing capability of the altered nucleotide. Usually this alteration affects only one strand of the parent double helix, so only one of the daughter molecules carries the mutation, but two of the granddaughter molecules produced during the next round of replication will have it (Figure 14.2B).

Figure 14.2. Examples of mutations.

Figure 14.2

Examples of mutations. (A) An error in replication leads to a mismatch in one of the daughter double helices, in this case a T-to-C change because one of the As in the template DNA was miscopied. When the mismatched molecule is itself replicated it gives (more...)

Errors in replication are a source of point mutations

When considered purely as a chemical reaction, complementary base-pairing is not particularly accurate. Nobody has yet devised a way of carrying out the template-dependent synthesis of DNA without the aid of enzymes, but if the process could be carried out simply as a chemical reaction in a test tube then the resulting polynucleotide would probably have point mutations at 5–10 positions out of every hundred. This represents an error rate of 5–10%, which would be completely unacceptable during genome replication. The template-dependent DNA polymerases that carry out DNA replication must therefore increase the accuracy of the process by several orders of magnitude. This improvement is brought about in two ways:

  • The DNA polymerase operates a nucleotide selection process that dramatically increases the accuracy of template-dependent DNA synthesis (Figure 14.3A). This selection process probably acts at three different stages during the polymerization reaction, discrimination against an incorrect nucleotide occurring when the nucleotide is first bound to the DNA polymerase, when it is shifted to the active site of the enzyme, and when it is attached to the 3′ end of the polynucleotide that is being synthesized.
  • The accuracy of DNA synthesis is increased still further if the DNA polymerase possesses a 3′→5′ exonuclease activity and so is able to remove an incorrect nucleotide that evades the base selection process and becomes attached to the 3′ end of the new polynucleotide (see Figure 4.7B). This is called proofreading (Section 13.2.2), but the name is a misnomer because the process is not an active checking mechanism. Instead, each step in the synthesis of a polynucleotide should be viewed as a competition between the polymerase and exonuclease functions of the enzyme, the polymerase usually winning because it is more active than the exonuclease, at least when the 3′-terminal nucleotide is base-paired to the template. But the polymerase activity is less efficient if the terminal nucleotide is not base-paired, the resulting pause in polymerization allowing the exonuclease activity to predominate so that the incorrect nucleotide is removed (see Figure 14.3B).

Figure 14.3. Mechanisms for ensuring the accuracy of DNA replication.

Figure 14.3

Mechanisms for ensuring the accuracy of DNA replication. (A) The DNA polymerase actively selects the correct nucleotide to insert at each position. (B) Those errors that occur can be corrected by ‘proofreading’ if the polymerase has a (more...)

Escherichia coli is able to synthesize DNA with an error rate of only 1 per 107 nucleotide additions. Interestingly, these errors are not evenly distributed between the two daughter molecules, the product of lagging-strand replication being prone to about 20 times as many errors as the leading-strand replicate. This asymmetry might indicate that DNA polymerase I, which is involved only in lagging-strand replication (Section 13.2.2), has a less effective base selection and proofreading capability compared with DNA polymerase III, the main replicating enzyme (Francino and Ochman, 1997).

Not all of the errors that occur during DNA synthesis can be blamed on the polymerase enzymes: sometimes an error occurs even though the enzyme adds the ‘correct’ nucleotide, the one that base-pairs with the template. This is because each nucleotide base can occur as either of two alternative tautomers, structural isomers that are in dynamic equilibrium. For example, thymine exists as two tautomers, the keto and enol forms, with individual molecules occasionally undergoing a shift from one tautomer to the other. The equilibrium is biased very much towards the keto form but every now and then the enol version of thymine occurs in the template DNA at the precise time that the replication fork is moving past. This will lead to an ‘error’, because enol-thymine base-pairs with G rather than A (Figure 14.4). The same problem can occur with adenine, the rare imino tautomer of this base preferentially forming a pair with C, and with guanine, enol-guanine pairing with thymine. After replication, the rare tautomer will inevitably revert to its more common form, leading to a mismatch in the daughter double helix.

Figure 14.4. The effects of tautomerism on base-pairing.

Figure 14.4

The effects of tautomerism on base-pairing. In each of these three examples, the two tautomeric forms of the base have different pairing properties. Cytosine also has amino and imino tautomers but both pair with guanine.

As stated above, the error rate for DNA synthesis in E. coli is 1 in 107. The overall error rate for replication of the E. coli genome is only 1 in 1010 to 1 in 1011, the improvement compared with the polymerase error rate being the result of the mismatch repair system (Section 14.2.3) that scans newly replicated DNA for positions where the bases are unpaired and hence corrects the few mistakes that the replication enzymes make. The implication is that only one uncorrected replication error occurs every 1000 times that the E. coli genome is copied.

Replication errors can also lead to insertion and deletion mutations

Not all errors in replication are point mutations. Aberrant replication can also result in small numbers of extra nucleotides being inserted into the polynucleotide being synthesized, or some nucleotides in the template not being copied. Insertions and deletions are often called frameshift mutations because when one occurs within a coding region it can result in a shift in the reading frame used for translation of the protein specified by the gene (see Figure 14.12). However, it is inaccurate to use ‘frameshift’ to describe all insertions and deletions because they can occur anywhere, not just in genes, and not all insertions or deletions in coding regions result in frameshifts: an insertion or deletion of three nucleotides, or multiples of three, simply adds or removes codons or parts of adjacent codons without affecting the reading frame.

Figure 14.12. Deletion mutations.

Figure 14.12

Deletion mutations. In the top sequence three nucleotides comprising a single codon are deleted. This shortens the resulting protein product by one amino acid but does not affect the rest of its sequence. In the lower section, a single nucleotide is deleted. (more...)

Insertion and deletion mutations can affect all parts of the genome but are particularly prevalent when the template DNA contains short repeated sequences, such as those found in microsatellites (Section 2.4.1). This is because repeated sequences can induce replication slippage, in which the template strand and its copy shift their relative positions so that part of the template is either copied twice or missed out. The result is that the new polynucleotide has a larger or smaller number, respectively, of the repeat units (Figure 14.5). This is the main reason why microsatellite sequences are so variable, replication slippage occasionally generating a new length variant, adding to the collection of alleles already present in the population.

Figure 14.5. Replication slippage.

Figure 14.5

Replication slippage. The diagram shows replication of a five-unit CA repeat microsatellite. Slippage has occurred during replication of the parent molecule, inserting an additional repeat unit into the newly synthesized polynucleotide of one of the daughter (more...)

Replication slippage is probably also responsible for the trinucleotide repeat expansion diseases that have been discovered in humans in recent years (Ashley and Warren, 1995). Each of these neurodegenerative diseases is caused by a relatively short series of trinucleotide repeats becoming elongated to two or more times its normal length. For example, the human HD gene contains the sequence 5′-CAG-3′ repeated between 6 and 35 times in tandem, coding for a series of glutamines in the protein product. In Huntington's disease this repeat expands to a copy number of 36–121, increasing the length of the polyglutamine tract and resulting in a dysfunctional protein (Perutz, 1999). Several other human diseases are also caused by expansions of polyglutamine codons (Table 14.1). Some diseases associated with mental retardation result from trinucleotide expansions in the leader region of a gene, giving a fragile site, a position where the chromosome is likely to break (Sutherland et al., 1998). Expansions involving intron and trailer regions are also known.

Table 14.1. Examples of human trinucleotide repeat expansions.

Table 14.1

Examples of human trinucleotide repeat expansions.

How triplet expansions are generated is not precisely understood. The size of the insertion is much greater than occurs with normal replication slippage, such as that seen with microsatellite sequences, and once the expansion reaches a certain length it appears to become susceptible to further expansion in subsequent rounds of replication, so that the disease becomes increasingly severe in succeeding generations. The possibility that expansion involves formation of hairpin loops in the DNA has been raised, based on the observation that only a limited number of trinucleotide sequences are known to undergo expansion, and all of these sequences are GC-rich and so might form stable secondary structures. There is also evidence that at least one triplet expansion region - for Friedreich's ataxia - can form a triple helix structure (Gacy et al., 1998). Studies of similar triplet expansions in yeast have shown that these are more prevalent when the RAD27 gene is inactivated (Freudenreich et al., 1998), an interesting observation as RAD27 is the yeast version of the mammalian gene for FEN1, the protein involved in processing of Okazaki fragments (Section 13.2.2). This might indicate that a trinucleotide repeat expansion is caused by an aberration in lagging-strand synthesis.

Mutations are also caused by chemical and physical mutagens

Many chemicals that occur naturally in the environment have mutagenic properties and these have been supplemented in recent years with other chemical mutagens that result from human industrial activity. Physical agents such as radiation are also mutagenic. Most organisms are exposed to greater or lesser amounts of these various mutagens, their genomes suffering damage as a result.

The definition of the term ‘mutagen’ is a chemical or physical agent that causes mutations. This definition is important because it distinguishes mutagens from other types of environmental agent that cause damage to cells in ways other than by causing mutations (Table 14.2). There are overlaps between these categories (for example, some mutagens are also carcinogens) but each type of agent has a distinct biological effect. The definition of ‘mutagen’ also makes a distinction between true mutagens and other agents that damage DNA without causing mutations, for example by causing breaks in DNA molecules. This type of damage may block replication and cause the cell to die, but it is not a mutation in the strict sense of the term and the causative agents are therefore not mutagens.

Table 14.2. Categories of environmental agent that cause damage to living cells.

Table 14.2

Categories of environmental agent that cause damage to living cells.

Mutagens cause mutations in three different ways:

  • Some act as base analogs and are mistakenly used as substrates when new DNA is synthesized at the replication fork.
  • Some react directly with DNA, causing structural changes that lead to miscopying of the template strand when the DNA is replicated. These structural changes are diverse, as we will see when we look at individual mutagens.
  • Some mutagens act indirectly on DNA. They do not themselves affect DNA structure, but instead cause the cell to synthesize chemicals such as peroxides that have a direct mutagenic effect.

The range of mutagens is so vast that it is difficult to devise an all-embracing classification. We will therefore restrict our study to the most common types. For chemical mutagens these are as follows:

  • Base analogs are purine and pyrimidine bases that are similar enough to the standard bases to be incorporated into nucleotides when these are synthesized by the cell. The resulting unusual nucleotides can then be used as substrates for DNA synthesis during genome replication. For example, 5-bromouracil (5-bU; Figure 14.6A) has the same base-pairing properties as thymine, and nucleotides containing this base can be added to the daughter polynucleotide at positions opposite As in the template. The mutagenic effect arises because the equilibrium between the two tautomers of 5-bU is shifted more towards the rarer enol form than is the case with thymine. This means that during the next round of replication there is a relatively high chance of the polymerase encountering enol-5bU, which (like enol-thymine) pairs with G rather than A (Figure 14.6B). This results in a point mutation (Figure 14.6C). 2-Aminopurine acts in a similar way: it is an analog of adenine with an amino-tautomer that pairs with thymine and an iminotautomer that pairs with cytosine, the imino form being more common than imino-adenine and hence inducing T-to-C transitions during DNA replication.
  • Deaminating agents also cause point mutations. A certain amount of base deamination (removal of an amino group) occurs spontaneously in genomic DNA molecules, with the rate being increased by chemicals such as nitrous acid, which deaminates adenine, cytosine and guanine (thymine has no amino group and so cannot be deaminated), and sodium bisulfite, which acts only on cytosine. Deamination of guanine is not mutagenic because the resulting base, xanthine, blocks replication when it appears in the template polynucleotide. Deamination of adenine gives hypoxanthine (Figure 14.7), which pairs with C rather than T, and deamination of cytosine gives uracil, which pairs with A rather than G. Deamination of these two bases therefore results in point mutations when the template strand is copied.
  • Alkylating agents are a third type of mutagen that can give rise to point mutations. Chemicals such as ethylmethane sulfonate (EMS) and dimethylnitrosamine add alkyl groups to nucleotides in DNA molecules, as do methylating agents such as methyl halides which are present in the atmosphere, and the products of nitrite metabolism. The effect of alkylation depends on the position at which the nucleotide is modified and the type of alkyl group that is added. Methylations, for example, often result in modified nucleotides with altered base-pairing properties and so lead to point mutations. Other alkylations block replication by forming crosslinks between the two strands of a DNA molecule, or by adding large alkyl groups that prevent progress of the replication complex.
  • Intercalating agents are usually associated with insertion mutations. The best known mutagen of this type is ethidium bromide, which fluoresces when exposed to UV radiation and so is used to reveal the positions of DNA bands after agarose gel electrophoresis (see Technical Note 2.1). Ethidium bromide and other intercalating agents are flat molecules that can slip between base pairs in the double helix, slightly unwinding the helix and hence increasing the distance between adjacent base pairs (Figure 14.8).

Figure 14.6. 5-Bromouracil and its mutagenic effect.

Figure 14.6

5-Bromouracil and its mutagenic effect. See the text for details.

Figure 14.7. Hypoxanthine is a deaminated version of adenine.

Figure 14.7

Hypoxanthine is a deaminated version of adenine. The nucleoside that contains hypoxanthine is called inosine (see Table 10.5).

Figure 14.8. The mutagenic effect of ethidium bromide.

Figure 14.8

The mutagenic effect of ethidium bromide. (A) Ethidium bromide is a flat plate-like molecule that is able to slot in between the base pairs of the double helix. (B) Ethidium bromide molecules are shown intercalated into the helix: the molecules are viewed (more...)

The most important types of physical mutagen are as follows:

  • UV radiation of 260 nm induces dimerization of adjacent pyrimidine bases, especially if these are both thymines (Figure 14.9A), resulting in a cyclobutyl dimer. Other pyrimidine combinations also form dimers, the order of frequency being 5′-CT-3′ > 5′-TC-3′ > 5′-CC-3′. Purine dimers are much less common. UV-induced dimerization usually results in a deletion mutation when the modified strand is copied. Another type of UV-induced photoproduct is the (6-4) lesion in which carbons number 4 and 6 of adjacent pyrimidines become covalently linked (Figure 14.9B).
  • Ionizing radiation has various effects on DNA depending on the type of radiation and its intensity. Point, insertion and/or deletion mutations might arise, as well as more severe forms of DNA damage that prevent subsequent replication of the genome. Some types of ionizing radiation act directly on DNA, others act indirectly by stimulating the formation of reactive molecules such as peroxides in the cell.
  • Heat stimulates the water-induced cleavage of the β-N-glycosidic bond that attaches the base to the sugar component of the nucleotide (Figure 14.10A). This occurs more frequently with purines than with pyrimidines and results in an AP (apurinic/apyrimidinic) or baseless site. The sugar-phosphate that is left is unstable and rapidly degrades, leaving a gap if the DNA molecule is double stranded (Figure 14.10B). This reaction is not normally mutagenic because cells have effective systems for repairing nicks (Section 14.2.1), which is reassuring when one considers that 10 000 AP sites are generated in each human cell per day. Gaps do, however, lead to mutations under certain circumstances, for example in E. coli when the SOS response is activated, when gaps are filled with As regardless of the identity of the nucleotide in the other strand (Section 14.1.3).

Figure 14.9. Photoproducts induced by UV irradiation.

Figure 14.9

Photoproducts induced by UV irradiation. A segment of a polynucleotide containing two adjacent thymine bases is shown. (A) A thymine dimer contains two UV-induced covalent bonds, one linking the carbons at position 6 and the other linking the carbons (more...)

Figure 14.10. The mutagenic effect of heat.

Figure 14.10

The mutagenic effect of heat. (A) Heat induces hydrolysis of β-N-glycosidic bonds, resulting in a baseless site in a polynucleotide. (B) Schematic representation of the effect of heat-induced hydrolysis on a double-stranded DNA molecule. The baseless (more...)

14.1.2. The effects of mutations

When considering the effects of mutations we must make a distinction between the direct effect that a mutation has on the functioning of a genome and its indirect effect on the phenotype of the organism in which it occurs. The direct effect is relatively easy to assess because we can use our understanding of gene structure and expression to predict the impact that a mutation will have on genome function. The indirect effects are more complex because these relate to the phenotype of the mutated organism which, as described in Section 7.2.2, is often difficult to correlate with the activities of individual genes.

The effects of mutations on genomes

Many mutations result in nucleotide sequence changes that have no effect on the functioning of the genome. These silent mutations include virtually all of those that occur in intergenic DNA and in the non-coding components of genes and gene-related sequences. In other words, some 98.5% of the human genome (see Box 1.4) can be mutated without significant effect.

Mutations in the coding regions of genes are much more important. First, we will look at point mutations that change the sequence of a triplet codon. A mutation of this type will have one of four effects (Figure 14.11):

  • It may result in a synonymous change, the new codon specifying the same amino acid as the unmutated codon. A synonymous change is therefore a silent mutation because it has no effect on the coding function of the genome: the mutated gene codes for exactly the same protein as the unmutated gene.
  • It may result in a non-synonymous change, the mutation altering the codon so that it specifies a different amino acid. The protein coded by the mutated gene therefore has a single amino acid change. This often has no significant effect on the biological activity of the protein because most proteins can tolerate at least a few amino acid changes without noticeable effect on their ability to function in the cell, but changes to some amino acids, such as those at the active site of an enzyme, have a greater impact. A non-synonymous change is also called a missense mutation.
  • The mutation may convert a codon that specifies an amino acid into a termination codon. This is a nonsense mutation and it results in a shortened protein because translation of the mRNA stops at this new termination codon rather than proceeding to the correct termination codon further downstream. The effect of this on protein activity depends on how much of the polypeptide is lost: usually the effect is drastic and the protein is non-functional.
  • The mutation could convert a termination codon into one specifying an amino acid, resulting in readthrough of the stop signal so the protein is extended by an additional series of amino acids at its C terminus. Most proteins can tolerate short extensions without an effect on function, but longer extensions might interfere with folding of the protein and so result in reduced activity.

Figure 14.11. Effects of point mutations on the coding region of a gene.

Figure 14.11

Effects of point mutations on the coding region of a gene. Four different effects of point mutations are shown, as described in the text. The readthrough mutation results in the gene being extended beyond the end of the sequence shown here, the leucine (more...)

Deletion and insertion mutations also have distinct effects on the coding capabilities of genes (Figure 14.12). If the number of deleted or inserted nucleotides is three or a multiple of three then one or more codons are removed or added, the resulting loss or gain of amino acids having varying effects on the function of the encoded protein. Deletions or insertions of this type are often inconsequential but will have an impact if, for example, amino acids involved in an enzyme's active site are lost, or if an insertion disrupts an important secondary structure in the protein. On the other hand, if the number of deleted or inserted nucleotides is not three or a multiple of three then a frameshift results, all of the codons downstream of the mutation being taken from a different reading frame from that used in the unmutated gene. This usually has a significant effect on the protein function, because a greater or lesser part of the mutated polypeptide has a completely different sequence to the normal polypeptide.

It is less easy to make generalizations about the effects of mutations that occur outside of the coding regions of the genome. Any protein binding site is susceptible to point, insertion or deletion mutations that change the identity or relative positioning of nucleotides involved in the DNA-protein interaction. These mutations therefore have the potential to inactivate promoters or regulatory sequences, with predictable consequences for gene expression (Figure 14.13; Sections 9.2 and 9.3). Origins of replication could conceivably be made non-functional by mutations that change, delete or disrupt sequences recognized by the relevant binding proteins (Section 13.2.1) but these possibilities are not well documented. There is also little information about the potential impact on gene expression of mutations that affect nucleosome positioning (Section 8.2.1).

Figure 14.13. Two possible effects of deletion mutations in the region upstream of a gene.

Figure 14.13

Two possible effects of deletion mutations in the region upstream of a gene.

One area that has been better researched concerns mutations that occur in introns or at intron-exon boundaries. In these regions, single point mutations will be important if they change nucleotides involved in the RNA-protein and RNA-RNA interactions that occur during splicing of different types of intron (Sections 10.1.3 and 10.2.3). For example, mutation of either the G or T in the DNA copy of the 5′ splice site of a GU-AG intron, or of the A or G at the 3′ splice site, will disrupt splicing because the correct intron-exon boundary will no longer be recognized. This may mean that the intron is not removed from the pre-mRNA, but it is more likely that a cryptic splice site (see page 289) will be used as an alternative. It is also possible for a mutation within an intron or an exon to create a new cryptic site that is preferred over a genuine splice site that is not itself mutated. Both types of event have the same result: relocation of the active splice site, leading to aberrant splicing. This might delete part of the resulting protein, add a new stretch of amino acids, or lead to a frameshift. Several versions of the blood disease β-thalassemia are caused by mutations that lead to cryptic splice site selection during processing of β-globin transcripts.

Box Icon

Box 14.1

Mutation detection. Rapid procedures for detecting mutations in DNA molecules. Many genetic diseases are caused by point mutations that result in modification or inactivation of a gene product. Methods for detecting these mutations are important in two (more...)

The effects of mutations on multicellular organisms

Now we turn to the indirect effects that mutations have on organisms, beginning with multicellular diploid eukaryotes such as humans. The first issue to consider is the relative importance of the same mutation in a somatic cell compared with a germ cell. Because somatic cells do not pass copies of their genomes to the next generation, a somatic cell mutation is important only for the organism in which it occurs: it has no potential evolutionary impact. In fact, most somatic cell mutations have no significant effect, even if they result in cell death, because there are many other identical cells in the same tissue and the loss of one cell is immaterial. An exception is when a mutation causes a somatic cell to malfunction in a way that is harmful to the organism, for instance by inducing tumor formation or other cancerous activity.

Mutations in germ cells are more important because they can be transmitted to members of the next generation and will then be present in all the cells of any individual who inherits the mutation. Most mutations, including all silent ones and many in coding regions, will still not change the phenotype of the organism in any significant way. Those that do have an effect can be divided into two categories:

  • Loss-of-function is the normal result of a mutation that reduces or abolishes a protein activity. Most loss-of-function mutations are recessive (Section 5.2.3), because in a heterozygote the second chromosome copy carries an unmutated version of the gene coding for a fully functional protein whose presence compensates for the effect of the mutation (Figure 14.14). There are some exceptions where a loss-of-function mutation is dominant, one example being haploinsufficiency, where the organism is unable to tolerate the approximately 50% reduction in protein activity suffered by the heterozygote. This is the explanation for a few genetic diseases in humans, including Marfan syndrome which results from a mutation in the gene for the connective tissue protein called fibrillin.
  • Gain-of-function mutations are much less common. The mutation must be one that confers an abnormal activity on a protein. Many gain-of-function mutations are in regulatory sequences rather than in coding regions, and can therefore have a number of consequences. For example, a mutation might lead to one or more genes being expressed in the wrong tissues, these tissues gaining functions that they normally lack. Alternatively the mutation could lead to overexpression of one or more genes involved in control of the cell cycle, thus leading to uncontrolled cell division and hence to cancer. Because of their nature, gain-of-function mutations are usually dominant.

Figure 14.14. A loss-of-function mutation is usually recessive because a functional version of the gene is present on the second chromosome copy.

Figure 14.14

A loss-of-function mutation is usually recessive because a functional version of the gene is present on the second chromosome copy.

Assessing the effects of mutations on the phenotypes of multicellular organisms can be difficult. Not all mutations have an immediate impact: some are delayed onset and only confer an altered phenotype later in the individual's life. Others display non-penetrance in some individuals, never being expressed even though the individual has a dominant mutation or is a homozygous recessive. With humans, these factors complicate attempts to map disease-causing mutations by pedigree analysis (Section 5.2.4) because they introduce uncertainty about which members of a pedigree carry a mutant allele.

The effects of mutations on microorganisms

Mutations in microbes such as bacteria and yeast can also be described as loss-of-function or gain-of-function, but with microorganisms this is neither the normal nor the most useful classification scheme. Instead, a more detailed description of the phenotype is usually attempted on the basis of the growth properties of mutated cells in various culture media. This enables most mutations to be assigned to one of four categories:

  • Auxotrophs are cells that will only grow when provided with a nutrient not required by the unmutated organism. For example, E. coli normally makes its own tryptophan, courtesy of the enzymes coded by the five genes in the tryptophan operon (Figure 2.20B). If one of these genes is mutated in such a way that its protein product is inactivated, then the cell is no longer able to make tryptophan and so becomes a tryptophan auxotroph. It cannot survive on a medium that lacks tryptophan, and can grow only when this amino acid is provided as a nutrient (Figure 14.15). Unmutated bacteria, which do not require extra supplements in their growth media, are called prototrophs.
  • Conditional-lethal mutants are unable to withstand certain growth conditions: under permissive conditions they appear to be entirely normal but when transferred to restrictive conditions the mutant phenotype is seen. Temperature-sensitive mutants are typical examples of conditional-lethal mutants. Temperature-sensitive mutants behave like wild-type cells at low temperatures but exhibit their mutant phenotype when the temperature is raised above a certain threshold, which is different for each mutant. Usually this is because the mutation reduces the stability of a protein, so the protein becomes unfolded and hence inactive when the temperature is raised.
  • Inhibitor-resistant mutants are able to resist the toxic effects of an antibiotic or another type of inhibitor. There are various molecular explanations for this type of mutant. In some cases the mutation changes the structure of the protein that is targeted by the inhibitor, so the latter can no longer bind to the protein and interfere with its function. This is the basis of streptomycin resistance in E. coli, which results from a change in the structure of ribosomal protein S12. Another possibility is that the mutation changes the properties of a protein responsible for transporting the inhibitor into the cell, this often being the way in which resistance to toxic metals is acquired.
  • Regulatory mutants have defects in promoters and other regulatory sequences. This category includes constitutive mutants, which continuously express genes that are normally switched on and off under different conditions. For example, a mutation in the operator sequence of the lactose operon (Section 9.3.1) can prevent the repressor from binding and so results in the lactose operon being expressed all the time, even when lactose is absent and the genes should be switched off (Figure 14.16).

Figure 14.15. A tryptophan auxotrophic mutant.

Figure 14.15

A tryptophan auxotrophic mutant. Two Petri-dish cultures are shown. Both contain minimal medium, which provides just the basic nutritional requirements for bacterial growth (nitrogen, carbon and energy sources, plus some salts). The medium on the left (more...)

Figure 14.16. The effect of a constitutive mutation in the lactose operator.

Figure 14.16

The effect of a constitutive mutation in the lactose operator. The operator sequence has been altered by a mutation and the lactose repressor can no longer bind to it. The result is that the lactose operon is transcribed all the time, even when lactose (more...)

In addition to these four categories, many mutations are lethal and so result in death of the mutant cell, whereas others have no effect. The latter are less common in microorganisms than in higher eukaryotes, because most microbial genomes are relatively compact, with little non-coding DNA. Mutations can also be leaky, meaning that a less extreme form of the mutant phenotype is expressed. For example, a leaky version of the tryptophan auxotroph illustrated in Figure 14.15 would grow slowly on minimal medium, rather than not growing at all.

14.1.3. Hypermutation and the possibility of programmed mutations

Is it possible for cells to utilize mutations in a positive fashion, either by increasing the rate at which mutations appear in their genomes, or by directing mutations towards specific genes? Both types of event might appear, at first glance, to go against the accepted wisdom that mutations occur randomly but, as we shall see, hypermutation and programmed mutations are possible without contravening this dogma.

Hypermutation occurs when a cell allows the rate at which mutations occur in its genome to increase. Several examples of hypermutation are known, one of these forming part of the mechanism used by vertebrates, including humans, to generate a diverse array of immunoglobulin proteins. We have already touched on this phenomenon in Section 12.2.1 when we examined the genome rearrangements that result in joining of the V, D, J and H segments of the immunoglobulin heavy- and light-chain genes (see Figure 12.15). Additional diversity is produced by hypermutation of the V-gene segments after assembly of the intact immunoglobulin gene (Figure 14.17), the mutation rate for these segments being 6–7 orders of magnitude greater than the background mutation rate experienced by the rest of the genome (Shannon and Weigert, 1998). This enhanced mutation rate appears to result from the unusual behavior of the mismatch repair system which normally corrects replication errors. At all other positions within the genome, the mismatch repair system corrects errors of replication by searching for mismatches and replacing the nucleotide in the daughter strand, this being the strand that has just been synthesized and so contains the error (see Section 14.2.3). At V-gene segments, the repair system changes the nucleotide in the parent strand, and so stabilizes the mutation rather than correcting it (Cascalho et al., 1998). The mechanism by which this is achieved has not yet been described.

Figure 14.17. Hypermutation of the V-gene segment of an intact immunoglobulin gene.

Figure 14.17

Hypermutation of the V-gene segment of an intact immunoglobulin gene. See Figure 12.15, for a description of the events leading to assembly of an immunoglobulin gene.

An apparent increase in mutation rate arising from modifications to the normal DNA repair process does not contradict the dogma regarding the randomness of mutations. However, problems have arisen with reports, dating back to 1988 (Cairns et al., 1988), which suggested that E. coli is able to direct mutations towards genes whose mutation would be advantageous under the environmental conditions that the bacterium is encountering. The original experiments involved a strain of E. coli that has a nonsense mutation in the lactose operon, inactivating the proteins needed for utilization of this sugar (Research Briefing 14.1). The bacteria were spread on an agar medium in which the only carbon source was lactose. This meant that a cell could grow and divide only if a second mutation occurred in the lactose operon, reversing the effects of the nonsense mutation and therefore allowing the lactose enzymes to be synthesized. Mutations with this effect appeared to occur significantly more frequently than expected, and at a rate that was greater than mutations in other parts of the genomes of these E. coli cells.

Box Icon

Box 14.1

Programmed mutations? In 1988 startling results were published suggesting that under some circumstances Escherichia coli bacteria are able to mutate in a directed way that enables cells to adapt to an environmental stress. The randomness of mutations (more...)

These experiments suggested that bacteria can program mutations according to the selective pressures that they are placed under. In other words, the environment can directly affect the phenotype of the organism, as suggested by Lamarck, rather than operating through the random processes postulated by Darwin. With such radical implications, it is not surprising that the experiments have been debated at length, with numerous attempts to discover flaws in their design or alternative explanations for the results. Variations of the original experimental system have suggested that the results are authentic, and similar events in other bacteria have been described. Models based on gene amplification rather than selective mutation are being tested (Andersson et al., 1998), and attention has also been directed at the possible roles of recombination events such as transposition of insertion elements in the generation of programmed mutations (Foster, 1999).

14.2. DNA Repair

In view of the thousands of damage events that genomes suffer every day, coupled with the errors that occur when the genome replicates, it is essential that cells possess efficient repair systems. Without these repair systems a genome would not be able to maintain its essential cellular functions for more than a few hours before key genes became inactivated by DNA damage. Similarly, cell lineages would accumulate replication errors at such a rate that their genomes would become dysfunctional after a few cell divisions.

Most cells possess four different categories of DNA repair system (Figure 14.18; Lindahl and Wood, 1999):

Figure 14.18. Four categories of DNA repair system.

Figure 14.18

Four categories of DNA repair system. See the text for details.

Most if not all organisms also possess systems that enable them to replicate damaged regions of their genome without prior repair. We will examine these systems in Section 14.2.5, and in Section 14.2.6 we will survey the human diseases that result from defects in DNA repair processes.

14.2.1. Direct repair systems fill in nicks and correct some types of nucleotide modification

Most of the types of DNA damage that are caused by chemical or physical mutagens (Section 14.1.1) can only be repaired by excision of the damaged nucleotide followed by resynthesis of a new stretch of DNA, as shown in Figure 14.18B. Only a few types of damaged nucleotide can be repaired directly:

  • Nicks can be repaired by a DNA ligase if all that has happened is that a phosphodiester bond has been broken, without damage to the 5′-phosphate and 3′-hydroxyl groups of the nucleotides either side of the nick (Figure 14.19). This is often the case with nicks resulting from the effects of ionizing radiation.
  • Some forms of alkylation damage are directly reversible by enzymes that transfer the alkyl group from the nucleotide to their own polypeptide chains. Enzymes capable of doing this are known in many different organisms and include the Ada enzyme of E. coli, which is involved in an adaptive process that this bacterium is able to activate in response to DNA damage. Ada removes alkyl groups attached to the oxygen groups at positions 4 and 6 of thymine and guanine, respectively, and can also repair phosphodiester bonds that have become methylated. Other alkylation repair enzymes have more restricted specificities, an example being human MGMT (O6-methylguanine-DNA methyltransferase) which, as its name suggests, only removes alkyl groups from position 6 of guanine.
  • Cyclobutyl dimers are repaired by a light-dependent direct system called photoreactivation. In E. coli, the process involves the enzyme called DNA photolyase (more correctly named deoxyribodipyrimidine photolyase). When stimulated by light with a wavelength between 300 and 500 nm the enzyme binds to cyclobutyl dimers and converts them back to the original monomeric nucleotides. Photoreactivation is a widespread but not universal type of repair: it is known in many but not all bacteria and also in quite a few eukaryotes, including some vertebrates, but is absent in humans and other placental mammals. A similar type of photoreactivation involves the (6-4) photoproduct photolyase and results in repair of (6-4) lesions. Neither E. coli nor humans have this enzyme but it is possessed by a variety of other organisms.

Figure 14.19. Repair of a nick by DNA ligase.

Figure 14.19

Repair of a nick by DNA ligase.

14.2.2. Excision repair

The direct types of damage reversal described above are important, but they form a very minor component of the DNA repair mechanisms of most organisms. This point is illustrated by the draft human genome sequences, which appear to contain just a single gene coding for a protein involved in direct repair (the MGMT gene), but which have at least 40 genes for components of the excision repair pathways (Wood et al., 2001). These pathways fall into two categories:

  • Base excision repair involves removal of a damaged nucleotide base, excision of a short piece of the polynucleotide around the AP site thus created, and resynthesis with a DNA polymerase.
  • Nucleotide excision repair is similar to base excision repair but is not preceded by removal of a damaged base and can act on more substantially damaged areas of DNA.

We will examine each of these pathways in turn.

Base excision repairs many types of damaged nucleotide

Base excision is the least complex of the various repair systems that involve removal of one or more damaged nucleotides followed by resynthesis of DNA to span the resulting gap. It is used to repair many modified nucleotides whose bases have suffered relatively minor damage resulting from, for example, exposure to alkylating agents or ionizing radiation (Section 14.1.1). The process is initiated by a DNA glycosylase which cleaves the β-N-glycosidic bond between a damaged base and the sugar component of the nucleotide (Figure 14.20A). Each DNA glycosylase has a limited specificity (Table 14.3), the specificities of the glycosylases possessed by a cell determining the range of damaged nucleotides that can be repaired by the base excision pathway. Most organisms are able to deal with deaminated bases such as uracil (deaminated cytosine) and hypoxanthine (deaminated adenine), oxidation products such as 5-hydroxycytosine and thymine glycol, and methylated bases such as 3-methyladenine, 7-methylguanine and 2-methylcytosine (Seeberg et al., 1995). Other DNA glycosylases remove normal bases as part of the mismatch repair system (Section 14.2.3). Most of the DNA glycosylases involved in base excision repair are thought to diffuse along the minor groove of the DNA double helix in search of damaged nucleotides, but some may be associated with the replication enzymes.

Figure 14.20. Base excision repair.

Figure 14.20

Base excision repair. (A) Excision of a damaged nucleotide by a DNA glycosylase. (B) Schematic representation of the base excision repair pathway. Alternative versions of the pathway are described in the text.

Table 14.3. Examples of human DNA glycosylases.

Table 14.3

Examples of human DNA glycosylases.

A DNA glycosylase removes a damaged base by ‘flipping’ the structure to a position outside of the helix and then detaching it from the polynucleotide (Kunkel and Wilson, 1996; Roberts and Cheng, 1998). This creates an AP or baseless site (see Figure 14.10) which is converted into a single nucleotide gap in the second step of the repair pathway (Figure 14.20B). This step can be carried out in a variety of ways. The standard method makes use of an AP endonuclease, such as exonuclease III or endonuclease IV of E. coli or human APE1, which cuts the phosphodiester bond on the 5′ side of the AP site. Some AP endonucleases can also remove the sugar from the AP site, this being all that remains of the damaged nucleotide, but others lack this ability and so work in conjunction with a separate phosphodiesterase. An alternative pathway for converting the AP site into a gap utilizes the endonuclease activity possessed by some DNA glycosylases, which can make a cut at the 3′ side of the AP site, probably at the same time that the damaged base is removed, followed again by removal of the sugar by a phosphodiesterase.

The single nucleotide gap is filled by a DNA polymerase, using base-paring with the undamaged base in the other strand of the DNA molecule to ensure that the correct nucleotide is inserted. In E. coli the gap is filled by DNA polymerase I and in mammals by DNA polymerase β (see Table 13.2; Sobol et al., 1996). Yeast seems to be unusual in that it uses its main DNA replicating enzyme, DNA polymerase δ, for this purpose (Seeberg et al., 1995). After gap filling, the final phosphodiester bond is put in place by a DNA ligase.

Nucleotide excision repair is used to correct more extensive types of damage

Nucleotide excision repair has a much broader specificity than the base excision system and is able to deal with more extreme forms of damage such as intrastrand crosslinks and bases that have become modified by attachment of large chemical groups. It is also able to correct cyclobutyl dimers by a dark repair process, providing those organisms that do not have the photoreactivation system (such as humans) with a means of repairing this type of damage.

In nucleotide excision repair, a segment of single-stranded DNA containing the damaged nucleotide(s) is excised and replaced with new DNA. The process is therefore similar to base excision repair except that it is not preceded by selective base removal, and a longer stretch of polynucleotide is excised. The best studied example of nucleotide excision repair is the short patch process of E. coli, so called because the region of polynucleotide that is excised and subsequently ‘patched’ is relatively short, usually 12 nucleotides in length.

Short patch repair is initiated by a multienzyme complex called the UvrABC endonuclease, sometimes also referred to as the ‘excinuclease’. In the first stage of the process a trimer comprising two UvrA proteins and one copy of UvrB attaches to the DNA at the damaged site. How the site is recognized is not known but the broad specificity of the process indicates that individual types of damage are not directly detected and that the complex must search for a more general attribute of DNA damage such as distortion of the double helix. UvrA may be the part of the complex most involved in damage location because it dissociates once the site has been found and plays no further part in the repair process. Departure of UvrA allows UvrC to bind (Figure 14.21), forming a UvrBC dimer that cuts the polynucleotide either side of the damaged site. The first cut is made by UvrB at the fifth phosphodiester bond downstream of the damaged nucleotide, and the second cut is made by UvrC at the eighth phosphodiester bond upstream, resulting in the 12 nucleotide excision, although there is some variability, especially in the position of the UvrB cut site. The excised segment is then removed, usually as an intact oligonucleotide, by DNA helicase II, which presumably detaches the segment by breaking the base pairs holding it to the second strand. UvrC also detaches at this stage, but UvrB remains in place and bridges the gap produced by the excision. The bound UvrB is thought to prevent the single-stranded region that has been exposed from base-pairing with itself, but alternative roles could be to prevent this strand from becoming damaged, or possibly to direct the DNA polymerase to the site that needs to be repaired. As in base excision repair, the gap is filled by DNA polymerase I and the last phosphodiester bond is synthesized by DNA ligase.

Figure 14.21. Short patch nucleotide excision repair in Escherichia coli.

Figure 14.21

Short patch nucleotide excision repair in Escherichia coli. The damaged nucleotide is shown distorting the helix because this is thought to be one of the recognition signals for the UvrAB trimer that initiates the short patch process. See the text for (more...)

E. coli also has a long patch nucleotide excision repair system that involves Uvr proteins but differs in that the piece of DNA that is excised can be anything up to 2 kb in length. Long patch repair has been less well studied and the process is not understood in detail, but it is presumed to work on more extensive forms of damage, possibly regions where groups of nucleotides, rather than just individual ones, have become modified. The eukaryotic nucleotide excision repair process is also called ‘long patch’ but results in replacement of only 24–29 nucleotides of DNA. In fact, there is no ‘short patch’ system in eukaryotes and the name is used to distinguish the process from base excision repair. The system is more complex than in E. coli and the relevant enzymes do not seem to be homologs of the Uvr proteins. In humans at least 16 proteins are involved, with the downstream cut being made at the same position as in E. coli - the fifth phosphodiester bond - but with a more distant upstream cut, resulting in the longer excision. Both cuts are made by endonucleases that attack single-stranded DNA specifically at its junction with a double-stranded region, indicating that before the cuts are made the DNA around the damage site has been melted, presumably by a helicase (Figure 14.22). This activity is provided at least in part by TFIIH, one of the components of the RNA polymerase II initiation complex (see Table 9.5). At first it was assumed that TFIIH simply had a dual role in the cell, functioning separately in both transcription and repair, but now it is thought that there is a more direct link between the two processes (Lehmann, 1995; Svejstrup et al., 1996). This view is supported by the discovery of transcription-coupled repair, which repairs some forms of damage in the template strands of genes that are being actively transcribed. The first type of transcriptioncoupled repair to be discovered was a modified version of nucleotide excision, but it now known that base-excision repair is also coupled with transcription (Cooper et al., 1997). These discoveries do not imply that nontranscribed regions of the genome are not repaired. The excision repair processes protect the entire genome from damage, but it is entirely logical that special mechanisms should exist for directing the processes at genes that are being transcribed. The template strands of these genes contain the genome's biological information and maintaining their integrity should be the highest priority for the repair systems.

Figure 14.22. Outline of the events involved during nucleotide excision repair in eukaryotes.

Figure 14.22

Outline of the events involved during nucleotide excision repair in eukaryotes. The endonucleases that remove the damaged region make cuts specifically at the junction between single-stranded and double-stranded regions of a DNA molecule. The DNA is therefore (more...)

14.2.3. Mismatch repair: correcting errors of replication

Each of the repair systems that we have looked at so far - direct, base excision and nucleotide excision repair - recognize and act upon DNA damage caused by mutagens. This means that they search for abnormal chemical structures such as modified nucleotides, cyclobutyl dimers and intrastrand crosslinks. They cannot, however, correct mismatches resulting from errors in replication because the mismatched nucleotide is not abnormal in any way, it is simply an A, C, G or T that has been inserted at the wrong position. As these nucleotides look exactly like any other nucleotide, the mismatch repair system that corrects replication errors has to detect not the mismatched nucleotide itself but the absence of base-pairing between the parent and daughter strands. Once it has found a mismatch, the repair system excises part of the daughter polynucleotide and fills in the gap, in a manner similar to base and nucleotide excision repair.

The scheme described above leaves one important question unanswered. The repair must be made in the daughter polynucleotide because it is in this newly synthesized strand that the error has occurred; the parent polynucleotide has the correct sequence. How does the repair process know which strand is which? In E. coli the answer is that the daughter strand is, at this stage, undermethylated and can therefore be distinguished from the parent polynucleotide, which has a full complement of methyl groups. E. coli DNA is methylated because of the activities of the DNA adenine methylase (Dam), which converts adenines to 6-methyladenines in the sequence 5′-GATC-3′, and the DNA cytosine methylase (Dcm), which converts cytosines to 5-methylcytosines in 5′-CCAGG-3′ and 5′-CCTGG-3′. These methylations are not mutagenic, the modified nucleotides having the same base-pairing properties as the unmodified versions. There is a delay between DNA replication and methylation of the daughter strand, and it is during this window of opportunity that the repair system scans the DNA for mismatches and makes the required corrections in the undermethylated, daughter strand (Figure 14.23).

Figure 14.23. Methylation of newly synthesized DNA in Escherichia coli does not occur immediately after replication, providing a window of opportunity for the mismatch repair proteins to recognize the daughter strands and correct replication errors.

Figure 14.23

Methylation of newly synthesized DNA in Escherichia coli does not occur immediately after replication, providing a window of opportunity for the mismatch repair proteins to recognize the daughter strands and correct replication errors.

E. coli has at least three mismatch repair systems, called ‘long patch’, ‘short patch and ‘very short patch’, the names indicating the relative lengths of the excised and resynthesized segments. The long patch system replaces up to a kb or more of DNA and requires the MutH, MutL and MutS proteins, as well as the DNA helicase II that we met during nucleotide excision repair. MutS recognizes the mismatch and MutH distinguishes the two strands by binding to unmethylated 5′-GATC-3′ sequences (Figure 14.24). The role of MutL is unclear but it might coordinate the activities of the other two proteins so that MutH binds to 5′-GATC-3′ sequences only in the vicinity of mismatch sites recognized by MutS. After binding, MutH cuts the phosphodiester bond immediately upstream of the G in the methylation sequence and DNA helicase II detaches the single strand. There does not appear to be an enzyme that cuts the strand downstream of the mismatch; instead the detached single-stranded region is degraded by an exonuclease that follows the helicase and continues beyond the mismatch site. The gap is then filled in by DNA polymerase I and DNA ligase. Similar events are thought to occur during short and very short mismatch repair, the difference being the specificities of the proteins that recognize the mismatch. The short patch system, which results in excision of a segment less than 10 nucleotides in length, begins when MutY recognizes an A-G or A-C mismatch, and the very short repair system corrects G-T mismatches which are recognized by the Vsr endonuclease.

Figure 14.24. Long patch mismatch repair in Escherichia coli.

Figure 14.24

Long patch mismatch repair in Escherichia coli. See the text for details.

Eukaryotes have homologs of the E. coli Mut proteins and their mismatch repair processes probably work in a similar way (Kolodner, 2000). The one difference is that methylation might not be the method used to distinguish between the parent and daughter polynucleotides. Methylation has been implicated in mismatch repair in mammalian cells, but the DNA of some eukaryotes, including fruit flies and yeast, is not extensively methylated; it is thought that these organisms must therefore use a different method. Possibilities include an association between the repair enzymes and the replication complex, so that repair is coupled with DNA synthesis, or use of single-strand binding proteins that mark the parent strand.

14.2.4. Repair of double-stranded DNA breaks

A single-stranded break in a double-stranded DNA molecule, such as is produced during the base and nucleotide excision repair processes and by some types of oxidative damage, does not present the cell with a critical problem. The double helix retains its overall intactness and the break can be repaired by template-dependent DNA synthesis (Figure 14.25A). A double-stranded break is more serious because this converts the original double helix into two separate fragments which have to be brought back together again in order for the break to be repaired (Figure 14.25B). The two broken ends must be protected from further degradation, which could result in a deletion mutation appearing at the repaired break point. The repair processes must also ensure that the correct ends are joined: if there are two broken chromosomes in the nucleus, then the correct pairs must be brought together so that the original structures are restored. Experimental studies of mouse cells indicate that achieving this outcome is difficult and if two chromosomes are broken then misrepair resulting in hybrid structures occurs relatively frequently (Richardson and Jasin, 2000). Even if only one chromosome is broken, there is still a possibility that a natural chromosome end could be confused as a break and an incorrect repair made. This type of error is not unknown, despite the presence of special telomerebinding proteins that mark the natural ends of chromosomes (Section 2.2.1).

Figure 14.25. Single- and double-strand-break repair.

Figure 14.25

Single- and double-strand-break repair. (A) A single-strand break does not disrupt the integrity of the double helix. The exposed single strand is coated with PARP1 proteins, which protect this intact strand from breaking and prevent it from participating (more...)

Double-strand breaks are generated by exposure to ionizing radiation and some chemical mutagens, and are also made by the cell, in a controlled fashion, during recombination events such as the genome rearrangements that join together immunoglobulin gene segments and T-cell receptor gene segments in B and T lymphocytes (Section 12.2.1). Progress in understanding the break repair system has been stimulated by studies of mutant human cell lines, which have resulted in the identification of various sets of genes involved in the process (Critchlow and Jackson, 1998). These genes specify a multi-component protein complex that directs a DNA ligase to the break (Figure 14.26). The complex includes a protein called Ku, made up of two non-identical subunits, which binds the DNA ends either side of the break (Walker et al., 2001). Ku binds to the DNA in association with the DNA-PKCS protein kinase, which activates a third protein, XRCC4, which interacts with the mammalian DNA ligase IV, directing this repair protein to the double-strand break.

Figure 14.26. Non-homologous end-joining (NHEJ) in humans.

Figure 14.26

Non-homologous end-joining (NHEJ) in humans. (A) The repair process. Additional proteins not shown in the diagram are also involved in NHEJ. These include the protein kinases ATM and ATR (Section 13.3.2), whose main role may be to signal to the cell the (more...)

The repair process is called non-homologous endjoining (NHEJ), the name indicating that there is no need for homology between the two molecules whose ends are being joined, unlike other end-joining mechanisms that we will encounter when we study recombination in Section 14.3. NHEJ is looked on as a type of recombination because, as well as repairing breaks, it can be used to join molecules or fragments that were not previously joined, producing new combinations. A version of the NHEJ system is probably used during construction of immunoglobulin and T-cell receptor genes, but the details are likely to be different because these programmed rearrangements of the genome involve intermediate structures, such as DNA hairpin loops, that are not seen during the repair of DNA breaks resulting from damage.

14.2.5. Bypassing DNA damage during genome replication

If a region of the genome has suffered extensive damage then it is conceivable that the repair processes will be overwhelmed. The cell then faces a stark choice between dying or attempting to replicate the damaged region even though this replication may be error-prone and result in mutated daughter molecules. When faced with this choice E. coli cells invariably take the second option, by inducing one of several emergency procedures for bypassing sites of major damage. The best studied of these bypass processes is the SOS response, which enables the cell to replicate its DNA even though the template polynucleotides contain AP sites and/or cyclobutyl dimers and other photoproducts resulting from exposure to chemical mutagens or UV radiation that would normally block, or at least delay, the replication complex. Bypass of these sites requires construction of a mutasome, comprising the UmuD′2C complex (also called DNA polymerase V, a trimer made up of two UmuD′ proteins and one copy of UmuC) and several copies of the RecA protein (Goodman, 2000). The latter is a single-stranded DNA-binding protein that coats the damaged strands, enabling the UmuD′2C complex to displace DNA polymerase III and carry out error-prone DNA synthesis until the damaged region has been passed and DNA polymerase III can take over once again (Figure 14.27).

Figure 14.27. The SOS response of Escherichia coli.

Figure 14.27

The SOS response of Escherichia coli. See the text for details.

The SOS response is primarily looked on as the last best chance that the bacterium has to replicate its DNA and hence survive under adverse conditions. However, the price of survival is an increased mutation rate because the mutasome does not repair damage, it simply allows a damaged region of a polynucleotide to be replicated. When it encounters a damaged position in the template DNA, the polymerase selects a nucleotide more or less at random, although with some preference for placing an A opposite an AP site: in effect the error rate of the replication process increases. It has been suggested that this increased mutation rate is the purpose of the SOS response, mutation being in some way an advantageous response to DNA damage, but this idea remains controversial (Chicurel, 2001).

For some time, the SOS response was thought to be the only damage-bypass process in bacteria, but we now appreciate that at least two other E. coli polymerases act in a similar way, although with different types of damage. These are DNA polymerase II, which can bypass nucleotides bound to mutagenic chemicals such as N-2-acetylaminofluorene, and DNA polymerase IV (also called DinB), which can replicate through a region of template DNA in which the two parent polynucleotides have become misaligned (Lindahl and Wood, 1999; Hanaoka, 2001). Bypass polymerases have also been discovered in eukaryotic cells. These include DNA polymerase η, which can bypass cyclobutyl dimers (Johnson et al., 1999), and DNA polymerases ι and ζ, which work together to replicate through photoproducts and AP sites (Johnson et al., 2000).

14.2.6. Defects in DNA repair underlie human diseases, including cancers

The importance of DNA repair is emphasized by the number and severity of inherited human diseases that have been linked with defects in one of the repair processes. One of the best characterized of these is xeroderma pigmentosum, which results from a mutation in any one of several genes for proteins involved in nucleotide excision repair. Nucleotide excision is the only way in which human cells can repair cyclobutyl dimers and other photoproducts, so it is no surprise that the symptoms of xeroderma pigmentosum include hypersensitivity to UV radiation, patients suffering more mutations than normal on exposure to sunlight, which often leads to skin cancer (Lehmann, 1995). Trichothiodystrophy is also caused by defects in nucleotide excision repair, but this is a more complex disorder which, although not involving cancer, usually includes problems with both the skin and nervous system.

A few diseases have been linked with defects in the transcription-coupled component of nucleotide excision repair. These include breast and ovarian cancers, the BRCA1 gene that confers susceptibility to these cancers coding for a protein that has been implicated, at least indirectly, with transcription-coupled repair (Gowen et al., 1998), and Cockayne syndrome, a complex disease manifested by growth and neurologic disorders (Hanawalt, 2000). A deficiency in transcription-coupled repair has also been identified in humans suffering from the cancer-susceptibility syndrome called HNPCC (hereditary non-polyposis colorectal cancer; Mellon et al., 1996), although this disease was originally identified as a defect in mismatch repair (Kolodner, 1995). Ataxia telangiectasia, the symptoms of which include sensitivity to ionizing radiation, results from defects in the ATX gene, which is involved in the damage-detection process (Section 13.3.2). Other diseases that are associated with a breakdown in DNA repair are Bloom's and Werner's syndromes, which are caused by inactivation of a DNA helicase that may have a role in NHEJ (Shen and Loeb, 2000; Wu and Hickson, 2001), and Fanconi's anemia, which confers sensitivity to chemicals that cause crosslinks in DNA but whose biochemical basis is not yet known.

14.3. Recombination

Without recombination, genomes would be relatively static structures, undergoing very little change. The gradual accumulation of mutations over a long period of time would result in small-scale alterations in the nucleotide sequence of the genome, but more extensive restructuring, which is the role of recombination, would not occur, and the evolutionary potential of the genome would be severely restricted.

Recombination was first recognized as the process responsible for crossing-over and exchange of DNA segments between homologous chromosomes during meiosis of eukaryotic cells (see Figure 5.15), and was subsequently implicated in the integration of transferred DNA into bacterial genomes after conjugation, transduction or transformation (Section 5.2.4). The biological importance of these processes stimulated the first attempts to describe the molecular events involved in recombination and led to the Holliday model (Holliday, 1964), with which we will begin our study of recombination.

14.3.1. Homologous recombination

The Holliday model refers to a type of recombination called general or homologous recombination. This is the most important version of recombination in nature, being responsible for meiotic crossing-over and the integration of transferred DNA into bacterial genomes.

The Holliday model for homologous recombination

The Holliday model describes recombination between two homologous double-stranded molecules, ones with identical or nearly identical sequences, but is equally applicable to two different molecules that share a limited region of homology, or a single molecule that recombines with itself because it contains two separate regions that are homologous with one another.

The central feature of the model is formation of a heteroduplex resulting from the exchange of polynucleotide segments between the two homologous molecules (Figure 14.28). The heteroduplex is initially stabilized by base-pairing between each transferred strand and the intact polynucleotide of the recipient molecule, this basepairing being possible because of the sequence similarity between the two molecules. Subsequently the gaps are sealed by DNA ligase, giving a Holliday structure. This structure is dynamic, branch migration resulting in exchange of longer segments of DNA if the two helices rotate in the same direction.

Figure 14.28. The Holliday model for homologous recombination.

Figure 14.28

The Holliday model for homologous recombination.

Separation, or resolution, of the Holliday structure back into individual double-stranded molecules occurs by cleavage across the branch point. This is the key to the entire process because the cut can be made in either of two orientations, as becomes apparent when the three-dimensional configuration or chi form of the Holliday structure is examined (see Figure 14.28). These two cuts have very different results. If the cut is made left-right across the chi form as drawn in Figure 14.28, then all that happens is that a short segment of polynucleotide, corresponding to the distance migrated by the branch of the Holliday structure, is transferred between the two molecules. On the other hand, an up-down cut results in reciprocal strand exchange, double-stranded DNA being transferred between the two molecules so that the end of one molecule is exchanged for the end of the other molecule. This is the DNA transfer seen in crossing-over.

So far we have ignored one aspect of the Holliday model. This is the way in which the two double-stranded molecules interact at the beginning of the process to produce the heteroduplex. In the original scheme, the two molecules lined up with one another and single-stranded nicks appeared at equivalent positions in each helix. This produced free single-stranded ends that could be exchanged, resulting in the heteroduplex (Figure 14.29A). This feature of the model was criticized because no mechanism could be proposed for ensuring that the nicks occurred at precisely the same position on each molecule. The Meselson-Radding modification (Meselson and Radding, 1975) proposes a more satisfactory scheme whereby a single-stranded nick occurs in just one of the double helices, the free end that is produced ‘invading’ the unbroken double helix at the homologous position and displacing one of its strands, forming a D-loop (Figure 14.29B). Subsequent cleavage of the displaced strand at the junction between its single-stranded and base-paired regions produces the heteroduplex.

Figure 14.29. Two schemes for initiation of homologous recombination.

Figure 14.29

Two schemes for initiation of homologous recombination. (A) Initiation as described by the original model for homologous recombination. (B) The Meselson-Radding modification, which proposes a more plausible series of events for formation of the heteroduplex. (more...)

Proteins involved in homologous recombination in E. coli

The Holliday model and Meselson-Radding modification refer to homologous recombination in all organisms but, as with many areas of molecular biology, the initial progress in understanding how the process is carried out in the cell was made with E. coli. The specific recombination system that has been studied has the circular E. coli genome as one partner and a linear chromosome fragment as the second partner, this being the situation that occurs during conjugation, transduction or transformation of bacterial cells (Section 5.2.4).

Mutation studies have identified a number of E. coli genes that, when inactivated, give rise to defects in homologous recombination, indicating that their protein products are involved in the process in some way. Three distinct recombination systems have been described, these being the RecBCD, RecE and RecF pathways, with RecBCD apparently being the most important in the bacterium (Camerini-Otero and Hsieh, 1995). In this pathway, recombination is initiated by the RecBCD enzyme, which has both nuclease and helicase activities. Its precise mode of action is uncertain: in the simplest model the enzyme binds to one end of the linear molecule and unwinds it until it reaches the first copy of the eight-nucleotide consensus sequence 5′-GCTGGTGG-3′ (rather confusingly called the chi site), which occurs once every 6 kb in E. coli DNA (Blattner et al., 1997). The nuclease activity of the enzyme then makes the single-stranded nick at a position approximately 56 nucleotides to the 3′ side of the chi site (Figure 14.30). Alternative proposals have the RecBCD enzyme making nicks as it progresses along the linear DNA, this activity being inhibited when the chi site is reached, the last of these progressive nicks being equivalent to the single nick envisaged in the first model (Eggleston and West, 1996).

Figure 14.30. The RecBCD pathway for homologous recombination in Escherichia coli.

Figure 14.30

The RecBCD pathway for homologous recombination in Escherichia coli. The events leading to formation of the heteroduplex are shown. In the bottom structure the RecA-coated DNA filament has formed a triplex structure, which is thought to be an intermediate (more...)

Whatever the precise mechanism, the RecBCD enzyme produces the free single-stranded end which, according to the Meselson-Radding modification, invades the intact partner, in this case the circular E. coli genome. This stage is mediated by the RecA protein, which forms a protein-coated DNA filament that is able to invade the intact double helix and set up the D-loop (see Figure 14.30). An intermediate in formation of the D-loop is probably a triplex structure, a three-stranded DNA helix in which the invading polynucleotide lies within the major groove of the intact helix and forms hydrogen bonds with the base pairs it encounters (Camerini-Otero and Hsieh, 1995).

Branch migration is catalyzed by the RuvA and RuvB proteins, both of which attach to the branch point of the Holliday structure. X-ray crystallography studies suggest that four copies of RuvA bind directly to the branch, forming a core to which two RuvB rings, each consisting of eight proteins, attach, one to either side (Figure 14.31; Rafferty et al., 1996). The resulting structure might act as a ‘molecular motor’, rotating the helices in the required manner so that the branch point moves. The RecG protein also has a role in branch migration but it is not clear if this is in conjunction with RuvAB, or as part of an alternative mechanism (Eggleston and West, 1996).

Figure 14.31. The role of the Ruv proteins in homologous recombination in Escherichia coli.

Figure 14.31

The role of the Ruv proteins in homologous recombination in Escherichia coli. Branch migration is induced by a structure comprising four copies of RuvA bound to the Holliday junction with an RuvB ring on either side. After RuvAB has detached, two RuvC (more...)

Branch migration does not appear to be a random process, but instead stops preferentially at the sequence

Image ch14e1.jpg
. This sequence occurs frequently in the E. coli genome, so presumably migration does not halt at the first instance of the motif that is reached. When branch migration has ended, the RuvAB complex detaches and is replaced by two RuvC proteins (see Figure 14.31) which carry out the cleavage that resolves the Holliday structure. The cuts are made between the second T and the
Image ch14e2.jpg
components of the recognition sequence.

The double-strand break model for recombination in yeast

Although the Holliday model for homologous recombination, either in its original form or as modified by Meselson and Radding, explains most of the results of recombination in all organisms, it has a few inadequacies, which prompted the development of alternative schemes. In particular, it was thought that the Holliday model could not explain gene conversion, a phenomenon first described in yeast and fungi but now known to occur with many eukaryotes. In yeast, fusion of a pair of gametes results in a zygote that gives rise to an ascus containing four haploid spores whose genotypes can be individually determined. If the gametes have different alleles at a particular locus then under normal circumstances two of the spores will display one genotype and two will display the other genotype, but sometimes this expected 2 : 2 segregation pattern is replaced by an unexpected 3 : 1 ratio (Figure 14.32). This is called gene conversion because the ratio can only be explained by one of the alleles ‘converting’ from one type to the other, presumably by recombination during the meiosis that occurs after the gametes have fused.

Figure 14.32. Gene conversion.

Figure 14.32

Gene conversion. One gamete contains allele A and the other contains allele a. These fuse to produce a zygote that gives rise to four haploid spores, all contained in a single ascus. Normally, two of the spores will have allele A and two will have allele (more...)

The double-strand break model provides an opportunity for gene conversion to take place during the recombination process. It initiates not with a single-strand nick, as in the Holliday scheme, but with a double-strand cut that breaks one of the partners in the recombination into two pieces (Figure 14.33). This might appear to be a drastic move to make but it has been shown that the protein responsible for the cut is a Type II DNA topoisomerase (Section 13.1.2) which forms covalent linkages with the two pieces of DNA and hence prevents them drifting completely apart. After the double-stranded cut, one strand in each half of the molecule is trimmed back by a 5′→3′ exonuclease, so each end now has a 3′ overhang of approximately 500 nucleotides. One of these invades the homologous DNA molecule in a manner similar to that envisaged by the Meselson-Radding scheme, setting up a Holliday junction that can migrate along the heteroduplex if the invading strand is extended by a DNA polymerase. To complete the heteroduplex, the other broken strand (the one not involved in the Holliday junction) is also extended. Note that both DNA syntheses involve extension of strands from the partner that suffered the double-stranded cut, using as templates the equivalent regions of the uncut partner. This is the basis of the gene conversion because it means that the polynucleotide segments removed from the cut partner by the exonuclease have been replaced with copies of the DNA from the uncut partner.

Figure 14.33. The double-strand break model for recombination in yeast.

Figure 14.33

The double-strand break model for recombination in yeast. This model explains how gene conversion can occur. See the text for details.

The resulting heteroduplex has a pair of Holliday structures that can be resolved in a number of ways, some resulting in gene conversion and others giving a standard reciprocal strand exchange. An example leading to gene conversion is shown in Figure 14.33.

The double-strand break model has been sufficiently well characterized in yeast for there to be little doubt that it occurs, at least in a form approximating to that shown in Figure 14.33. Some of the proteins involved in recombination in yeast are very similar to their counterparts in E. coli - eukaryotic RAD51, for example, has sequence similarity with RecA and is believed to work in the same way (Baumann and West, 1998) - prompting the suggestion that recombination in all organisms follows the double-strand break system. As yet there is little evidence to support this idea, particularly for the larger chromosomes of higher eukaryotes, and many geneticists resist the suggestion that vertebrate DNA undergoes frequent double-strand breaks during meiosis.

Box Icon

Box 14.2

The RecE and RecF recombination pathways of Escherichia coli. The RecBCD, RecE and RecF pathways involve similar mechanisms and share several of the same proteins. RecA is involved in each pathway, RecF is a component of both the RecE and RecF pathways, (more...)

14.3.2. Site-specific recombination

A region of extensive homology is not a prerequisite for recombination: the process can also be initiated between two DNA molecules that have only very short sequences in common. This is called site-specific recombination and it has been extensively studied because of the part that it plays during the infection cycle of bacteriophage λ.

Integration of λ DNA into the E. coli genome involves site-specific recombination

After injecting its DNA into an E. coli cell, bacteriophage λ can follow either of two infection pathways (Section 4.2.1). One of these, the lytic pathway, results in the rapid synthesis of λ coat proteins, combined with replication of the λ genome, leading to death of the bacterium and release of new phages within about 20 minutes of the initial infection. In contrast, if the phage follows the lysogenic pathway, new phages do not immediately appear. The bacterium divides as normal, possibly for many cell divisions, with the phage in a quiescent form called the prophage. Eventually, possibly as the result of DNA damage or some other stimulus, the phage becomes active again, replicating its genome, directing synthesis of coat proteins, and bursting from the cell.

During the lysogenic phase the λ genome becomes integrated into the E. coli chromosome. It is therefore replicated whenever the E. coli DNA is copied, and so is passed on to daughter cells as if a standard part of the bacterium's genome. Integration occurs by site-specific recombination between the att sites, one on the λ genome and one on the E. coli chromosome, which have at their center an identical 15-bp sequence (Figure 14.34). Because this is recombination between two circular molecules, the result is that one bigger circle is formed; in other words the λ DNA becomes integrated into the bacterial genome. A second site-specific recombination between the two att sites, now both contained in the same molecule, reverses the original process and releases the λ DNA, which can now return to the lytic mode of infection and direct synthesis of new phages.

Figure 14.34. Integration of the bacteriophage λ genome into Escherichia coli chromosomal DNA.

Figure 14.34

Integration of the bacteriophage λ genome into Escherichia coli chromosomal DNA. Both λ and E. coli DNA have a copy of the att site, each one comprising an identical central sequence called ‘O’ and flanking sequences P (more...)

The recombination event is catalyzed by a specialized Type I topoisomerase (Section 13.1.2) called integrase (Kwon et al., 1997), a member of a diverse family of recombinases present in bacteria, archaea and yeast. The enzyme makes a staggered double-stranded cut at equivalent positions in the λ and bacterial att sites. The two short single-stranded overhangs are then exchanged between the DNA molecules, producing a Holliday junction which migrates a few base pairs along the heteroduplex before being cleaved. This cleavage, providing that it is made in the appropriate orientation, resolves the Holliday structure in such a way that the λ DNA becomes inserted into the E. coli genome. A similar process underlies excision, which is also carried out by integrase, but in conjunction with a second protein, ‘excisionase’, coded by the λ xis gene. If integrase could carry out excision on its own then it would probably excise the λ DNA as soon as it had integrated it.

14.3.3. Transposition

Transposition is not a type of recombination but a process that utilizes recombination, the end result being the transfer of a segment of DNA from one position in the genome to another. A characteristic feature of transposition is that the transferred segment is flanked by a pair of short direct repeats (Figure 14.35) which, as we will see, are formed during the transposition process.

Figure 14.35. Integrated transposable elements are flanked by short direct repeat sequences.

Figure 14.35

Integrated transposable elements are flanked by short direct repeat sequences. This particular transposon is flanked by the tetranucleotide repeat 5′-CTGG-3′. Other transposons have different direct repeat sequences.

In Section 2.4.2 we examined the various types of transposable element known in eukaryotes and prokaryotes and discovered that these could be broadly divided into three categories on the basis of their transposition mechanism (Figure 14.36):

  • DNA transposons that transpose replicatively, the original transposon remaining in place and a new copy appearing elsewhere in the genome;
  • DNA transposons that transpose conservatively, the original transposon moving to a new site by a cut-and-paste process;
  • Retroelements, all of which transpose via an RNA intermediate.

Figure 14.36. Replicative and conservative transposition.

Figure 14.36

Replicative and conservative transposition. DNA transposons use either the replicative or conservative pathway (some can use both). Retroelements transpose replicatively via an RNA intermediate.

We will now examine the recombination events that are responsible for each of these three types of transposition.

Box Icon

Box 14.3

DNA methylation and transposition. Transposition can have deleterious effects on a genome. These effects go beyond the obvious disruption of gene activity that will occur if a transposable element takes up a new position that lies within the coding region (more...)

Replicative and conservative transposition of DNA transposons

A number of models for replicative and conservative transposition have been proposed over the years but most are modifications of a scheme originally outlined by Shapiro (1979). According to this model, the replicative transposition of a bacterial element such as a Tn3-type transposon or a transposable phage (Section 2.4.2) is initiated by one or more endonucleases that make single-stranded cuts either side of the transposon and in the target site where the new copy of the element will be inserted (Figure 14.37). At the target site the two cuts are separated by a few base pairs, so that the cleaved double-stranded molecule has short 5′ overhangs.

Figure 14.37. A model for the process resulting in replicative and conservative transposition.

Figure 14.37

A model for the process resulting in replicative and conservative transposition. See the text for details.

Ligation of these 5′ overhangs to the free 3′ ends either side of the transposon produces a hybrid molecule in which the original two DNAs - the one containing the transposon and the one containing the target site - are linked together by the transposable element flanked by a pair of structures resembling replication forks. DNA synthesis at these replication forks copies the transposable element and converts the initial hybrid into a co-integrate, in which the two original DNAs are still linked. Homologous recombination between the two copies of the transposon uncouples the co-integrate, separating the original DNA molecule (with its copy of the transposon still in place) from the target molecule, which now contains a copy of the transposon. Replicative transposition has therefore occurred.

A modification of the process just described changes the mode of transposition from replicative to conservative (see Figure 14.37). Rather than carrying out DNA synthesis, the hybrid structure is converted back into two separate DNA molecules simply by making additional single-stranded nicks either side of the transposon. This cuts the transposon out of its original molecule, leaving it ‘pasted’ into the target DNA.

Transposition of retroelements

From the human perspective, the most important retroelements are the retroviruses, which include the human immunodeficiency viruses that cause AIDS and various other virulent types. Most of what we know about retrotransposition refers specifically to retroviruses, although it is believed that other retroelements, such as retrotransposons of the Ty1/copia and Ty3/gypsy families, transpose by similar mechanisms.

The first step in retrotransposition is synthesis of an RNA copy of the inserted retroelement (Figure 14.38). The long terminal repeat (LTR) at the 5′ end of the element contains a TATA sequence which acts as a promoter for transcription by RNA polymerase II (Section 9.2.2). Some retroelements also have enhancer sequences (Section 9.3) that are thought to regulate the amount of transcription that occurs. Transcription continues through the entire length of the element, up to a polyadenylation sequence (Section 10.1.2) in the 3′ LTR. The transcript now acts as the template for RNA-dependent DNA synthesis, catalyzed by a reverse transcriptase enzyme coded by part of the pol gene of the retroelement (see Figure 2.26). Because this is synthesis of DNA, a primer is required (Section 13.2.2), and as during genome replication, the primer is made of RNA rather than DNA. During genome replication, the primer is synthesized de novo by a polymerase enzyme (see Figure 13.12), but retroelements do not code for RNA polymerases and so cannot make primers in this way. Instead they use one of the cell's tRNA molecules as a primer, which one depending on the retroelement: the Ty1/copia family of elements always use tRNAMet but other retroelements use different tRNAs.

Figure 14.38. Transposition of a retroelement - Part 1.

Figure 14.38

Transposition of a retroelement - Part 1. This diagram shows how an integrated retroelement is copied into a free double-stranded DNA version. The first step is synthesis of an RNA copy, which is then converted to double-stranded DNA by a series of events (more...)

The tRNA primer anneals to a site within the 5′ LTR (see Figure 14.38) At first glance this appears to be a strange location for the priming site because it means that DNA synthesis is directed away from the central region of the retroelement and so results in only a short copy of part of the 5′ LTR. In fact, when the DNA copy has been extended to the end of the LTR, a part of the RNA template is degraded and the DNA overhang that is produced re-anneals to the 3′ LTR of the retroelement which, being a long terminal repeat, has the same sequence as the 5′ LTR and so can base-pair with the DNA copy. DNA synthesis now continues along the RNA template, eventually displacing the tRNA primer. Note that the result is a DNA copy of the entire template, including the priming site: the template switching is, in effect, the strategy that the retroelement uses to solve the ‘end-shortening’ problem, the same problem that chromosomal DNAs address through telomere synthesis (Section 13.2.4).

Completion of synthesis of the first DNA strand results in a DNA-RNA hybrid. The RNA is partially degraded by an RNase H enzyme, coded by another part of the pol gene. The RNA that is not degraded, usually just a single fragment attached to a short polypurine sequence adjacent to the 3′ LTR, primes synthesis of the second DNA strand, again by reverse transcriptase, which is able to act as both an RNA- and DNA-dependent DNA polymerase. As with the first round of DNA synthesis, second-strand synthesis initially results in a DNA copy of just the LTR, but a second template switch, to the other end of the molecule, enables the DNA copy to be extended until it is full length. This creates a template for further extension of the first DNA strand, so that the resulting double-stranded DNA is a complete copy of the internal region of the retroelement plus the two LTRs.

All that remains is to insert the new copy of the retroelement into the genome. It was originally thought that insertion occurred randomly, but it now appears that although no particular sequence is used as a target site, integration occurs preferentially at certain positions (Devine and Boeke, 1996). Insertion involves removal of two nucleotides from the 3′ ends of the double-stranded retroelement by the integrase enzyme (coded by yet another part of pol). The integrase also makes a staggered cut in the genomic DNA so that both the retroelement and the integration site now have 5′ overhangs (Figure 14.39). These overhangs might not have complementary sequences but they still appear to interact in some way so that the retroelement becomes inserted into the genomic DNA. The interaction results in loss of the retroelement overhangs and filling in of the gaps that are left, which means that the integration site becomes duplicated into a pair of direct repeats, one at either end of the inserted retroelement.

Figure 14.39. Transposition of a retroelement - Part 2.

Figure 14.39

Transposition of a retroelement - Part 2. Integration of the retroelement into the genome results in a four-nucleotide direct repeat either side of the inserted sequence. With retroviruses, this stage in the transposition pathway requires both integrase (more...)

Study Aids For Chapter 14

Key terms

Give short definitions of the following terms:

Self study questions


Distinguish between the terms ‘mutation’, ‘DNA repair’ and ‘recombination’.


Explain how errors in DNA replication can lead to mutations.


Giving examples, summarize the key features of trinucleotide repeat expansion diseases.


List the various types of chemical and physical agents that have mutagenic properties. Give at least one example of each type of agent and describe the types of mutation that they cause.


Describe the various effects that a mutation can have on the coding properties of a genome.


Distinguish between the effects of mutations on the somatic and germ cells of multicellular organisms.


Name and define the four major types of mutant phenotype recognized in bacteria.


Describe, with examples, what is meant by the terms ‘hypermutation’ and ‘programmed mutation’.


Distinguish between the various types of DNA repair mechanism that are known.


Compare and contrast the direct DNA repair systems of bacteria and eukaryotes.


Give detailed descriptions of the base excision and nucleotide excision repair processes of bacteria and eukaryotes.


Describe the mismatch repair processes of bacteria and eukaryotes, paying attention to the ways in which the daughter and parent strands are recognized in the two types of organism.


Define the term ‘non-homologous end-joining’ and explain how this process results in the repair of double-strand breaks in DNA molecules.


How can DNA damage be bypassed during genome replication in Escherichia coli and eukaryotes?


Discuss the links between DNA repair and human disease.


Draw a fully annotated diagram of the Holliday model for homologous recombination.


In what way does the Meselson-Radding modification improve the Holliday model for homologous recombination?


Describe the functions of each of the proteins thought to be involved in homologous recombination in Escherichia coli.


Draw a fully annotated diagram of the double-strand break model for recombination in yeast.


Describe how site-specific recombination underlies insertion and excision of the λ genome into and out of the Escherichia coli chromosome.


Explain how recombination events can result in the replicative or conservative transposition of a DNA sequence.


Draw a fully annotated diagram illustrating the transposition mechanism of a retrovirus.

Problem-based learning


Explore the current knowledge concerning trinucleotide repeat expansion diseases, including hypotheses that attempt to explain why triplet expansion in these genes leads to a disease.


‘Not all mutations have an immediate impact: some are delayed onset and only confer an altered phenotype later in the individual's life. Others display non-penetrance in some individuals, never being expressed even though the individual has a dominant mutation or is a homozygous recessive.’ Devise mechanisms to explain how mutations can exhibit delayed onset or non-penetrance.


Evaluate the evidence for programmed mutations.


The bacterium Deinococcus radiodurans is highly resistant to radiation and to other physical and chemical mutagens. Discuss how these special properties of D. radiodurans are reflected in its genome sequence. (See White O, Eisen JA, Heidelberg JF, et al. [1999] Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1. Science, 286, 1571-1577.)


Assess the general importance of the double-strand break model for gene conversion in yeast. Is there evidence for this type of recombination in organisms other than yeast?


  1. Andersson DI, Slechta ES, Roth JR. Evidence that gene amplification underlies adaptive mutability of the bacterial lac operon. Science. (1998);282:1133–1135. [PubMed: 9804552]
  2. Ashley CT, Warren ST. Trinucleotide repeat expansion and human disease. Ann. Rev. Genet. (1995);29:703–728. [PubMed: 8825491]
  3. Baumann P, West SC. Role of the human RAD51 protein in homologous recombination and double-stranded-break repair. Trends Biochem. Sci. (1998);23:247–251. [PubMed: 9697414]
  4. Blattner FR, Plunkett G, Bloch CA. et al. The complete genome sequence of Escherichia coli K-12. Science. (1997);277:1453–1462. [PubMed: 9278503]
  5. Cairns J, Overbaugh J, Miller S. The origin of mutants. Nature. (1988);335:142–145. [PubMed: 3045565]
  6. Camerini-Otero RD, Hsieh P. Homologous recombination proteins in prokaryotes and eukaryotes. Ann. Rev. Genet. (1995);29:509–552. [PubMed: 8825485]
  7. Campuzano V, Montermini L, Moltò MD. et al. Friedreich's ataxia: autosomal recessive disease caused by an intronic GAA triplet repeat expansion. Science. (1996);271:1423–1427. [PubMed: 8596916]
  8. Cascalho M, Wong J, Steinberg C, Wabl M. Mismatch repair co-opted by hypermutation. Science. (1998);279:1207–1210. [PubMed: 9469811]
  9. Chicurel M. Can organisms speed their own mutation? Science. (2001);292:1824–1827. [PubMed: 11397928]
  10. Cooper PK, Nouspikel T, Clarkson SG, Leadon SA. Defective transcription-coupled repair of oxidative base damage in Cockayne syndrome patients from XP group G. Science. (1997);275:990–993. [PubMed: 9020084]
  11. Cotton RGH. Slowly but surely towards better scanning for mutations. Trends Genet. (1997);13:43–46. [PubMed: 9055601]
  12. Critchlow SE, Jackson SP. DNA end-joining: from yeast to man. Trends Biochem. Sci. (1998);23:394–398. [PubMed: 9810228]
  13. Cummings CJ, Zoghbi HY. Trinucleotide repeats: mechanisms and pathophysiology. Annu. Rev. Genomics Hum. Genet. (2000);1:281–328. [PubMed: 11701632]
  14. Daniel R, Katz RA, Skalka AM. A role for DNA-PK in retroviral DNA integration. Science. (1999);284:644–647. [PubMed: 10213687]
  15. Devine SE, Boeke JD. Integration of the yeast retrotransposon Ty1 is targeted to regions upstream of genes transcribed by RNA polymerase III. Genes Devel. (1996);10:620–633. [PubMed: 8598291]
  16. Eggleston AK, West SC. Exchanging partners in E. coli. Trends Genet. (1996);12:20–26. [PubMed: 8741856]
  17. Foster PL. Mechanism of stationary phase mutation: a decade of adaptive mutation. Ann. Rev. Genet. (1999);33:57–88. [PMC free article: PMC2922758] [PubMed: 10690404]
  18. Francino MP, Ochman H. Strand asymmetries in DNA evolution. Trends Genet. (1997);13:240–245. [PubMed: 9196330]
  19. Freudenreich CH, Kantrow SM, Zakian VA. Expansion and length-dependent fragility of CTG repeats in yeast. Science. (1998);279:853–856. [PubMed: 9452383]
  20. Gacy AM, Goellner GM, Spiro C. et al. GAA instability in Friedreich's Ataxia shares a common, DNA-directed and intraallelic mechanism with other trinucleotide diseases. Mol. Cell. (1998);1:583–593. [PubMed: 9660942]
  21. Goodman MF. Coping with replication ‘train wrecks’ in Escherichia coli using Pol V, Pol II and RecA proteins. Trends Biochem. Sci. (2000);25:189–195. [PubMed: 10754553]
  22. Gowen LC, Avrutskaya AV, Latour AM, Koller BH, Leadon SA. BRCA1 required for transcription-coupled repair of oxidative DNA damage. Science. (1998);281:1009–1012. [PubMed: 9703501]
  23. Hanaoka F. SOS polymerases. Nature. (2001);409:33–34. [PubMed: 11343100]
  24. Hanawalt PC. The bases for Cockayne syndrome. Nature. (2000);405:415–416. [PubMed: 10839526]
  25. Holliday R. A mechanism for gene conversion in fungi. Genet. Res. (1964);5:282–304.
  26. Johnson RE, Prakash S, Prakash L. Efficient bypass of a thymine-thymine dimer by yeast DNA polymerase, Polη Science. (1999);283:1001–1004. [PubMed: 9974380]
  27. Johnson RE, Washington MT, Haracska L, Prakash S, Prakash L. Eukaryotic polymerases ι and ζ act sequentially to bypass DNA lesions. Nature. (2000);406:1015–1019. [PubMed: 10984059]
  28. Kolodner RD. Mismatch repair: mechanisms and relationship to cancer susceptibility. Trends Biochem. Sci. (1995);20:397–401. [PubMed: 8533151]
  29. Kolodner RDE. Guarding against mutation. Nature. (2000);407:687–689. [PubMed: 11048703]
  30. Kunkel TA, Wilson SH. Push and pull of base flipping. Nature. (1996);384:25–26. [PubMed: 8900270]
  31. Kwon HJ, Tirumalai R, Landy A, Ellenberger T. Flexibility in DNA recombination: structure of the lambda integrase catalytic core. Science. (1997);276:126–131. [PMC free article: PMC1839824] [PubMed: 9082984]
  32. Lehmann AR. Nucleotide excision repair and the link with transcription. Trends Biochem. Sci. (1995);20:402–405. [PubMed: 8533152]
  33. Lindahl T, Wood RD. Quality control by DNA repair. Science. (1999);286:1897–1905. [PubMed: 10583946]
  34. Mandel J-L. Breaking the rule of three. Nature. (1997);386:767–769. [PubMed: 9126731]
  35. Mellon I, Rajpal DK, Koi M, Boland CR, Champe GN. Transcription-coupled repair deficiency and mutations in human mismatch repair genes. Science. (1996);272:557–560. [PubMed: 8614807]
  36. Meselson M, Radding CM. A general model for genetic recombination. Proc. Natl Acad. Sci. USA. (1975);72:358–361. [PMC free article: PMC432304] [PubMed: 1054510]
  37. Perutz MF. Glutamine repeats and neurodegenerative diseases: molecular aspects. Trends Biochem. Sci. (1999);24:58–63. [PubMed: 10098399]
  38. Rafferty JB, Sedelnikova SE, Hargreaves D. et al. Crystal structure of DNA recombination protein RuvA and a model for its binding to the Holliday junction. Science. (1996);274:415–421. [PubMed: 8832889]
  39. Richardson C, Jasin M. Frequent chromosomal translocations induced by DNA double-strand breaks. Nature. (2000);405:697–700. [PubMed: 10864328]
  40. Roberts RJ, Cheng X. Base flipping. Ann. Rev. Biochem. (1998);67:181–198. [PubMed: 9759487]
  41. Seeberg E, Eide L, Bjørås M. The base excision repair pathway. Trends Biochem. Sci. (1995);20:391–397. [PubMed: 8533150]
  42. Shannon M, Weigert M. Fixing mismatches. Science. (1998);279:1159–1160. [PubMed: 9508690]
  43. Shapiro JA. Molecular model for the transposition and replication of bacteriophage Mu and other transposable elements. Proc. Natl Acad. Sci. USA. (1979);76:1933–1937. [PMC free article: PMC383507] [PubMed: 287033]
  44. Shen J-C, Loeb LA. The Werner syndrome gene: the molecular basis of RecQ helicase-deficiency diseases. Trends Genet. (2000);16:213–220. [PubMed: 10782115]
  45. Sobol RW, Horton JK, Kühn R. et al. Requirement of mammalian DNA polymerase-β in base-excision repair. Nature. (1996);379:183–186. [PubMed: 8538772]
  46. Sutherland GR, Baker E, Richards RI. Fragile sites still breaking. Trends Genet. (1998);14:501–506. [PubMed: 9865156]
  47. Svejstrup JQ, Vichi P, Egly J-M. The multiple roles of transcription factor/repair factor TFIIH. Trends Biochem. Sci. (1996);21:346–350. [PubMed: 8870499]
  48. Symer DE, Bender J. Hip-hopping out of control. Nature. (2001);411:146–149. [PubMed: 11346774]
  49. Walker JR, Corpina RA, Goldberg J. Structure of the Ku heterodimer bound to DNA and its implications for double-strand break repair. Nature. (2001);412:607–614. [PubMed: 11493912]
  50. Wood RD, Mitchell M, Sgouros J, Lindahl T. Human DNA repair genes. Science. (2001);291:1284–1289. [PubMed: 11181991]
  51. Wu L, Hickson ID. DNA ends RecQ-uire attention. Science. (2001);292:229–230. [PubMed: 11305313]

Further Reading

  1. Buermeyer AB, Deschênes SM, Baker SM, Liskay RM. Mammalian DNA mismatch repair. Ann. Rev. Genet. (1999);33:533–564. [PubMed: 10690417]
  2. Harfe BD, Jinks-Robertson S. DNA mismatch repair and genetic instability. Ann. Rev. Genet. (2000);34:359–399.Comprehensive details of mismatch repair in bacteria and eukaryotes. [PubMed: 11092832]
  3. Kowalcsykowski SC. Initiation of genetic recombination and recombination-dependent replication. Trends Biochem. Sci. (2000);25:156–165.Contains useful summaries of various aspects of recombination. [PubMed: 10754547]
  4. Kunkel TA, Bebenek K. DNA replication fidelity. Ann. Rev. Biochem. (2000);69:497–529.Covers the processes that ensure that the minimum number of errors are made during DNA replication. [PubMed: 10966467]
  5. Shinagawa H, Iwasaki H. Processing the Holliday junction in homologous recombination. Trends Biochem. Sci. (1996);21:107–111.An illuminating description of the central event in recombination. [PubMed: 8882584]
  6. Sutton MD, Smith BT, Godoy VG, Walker GC. The SOS response: recent insights into umuDC-dependent mutagenesis and DNA damage tolerance. Ann. Rev. Genet. (2000);34:479–497. [PubMed: 11092836]
  7. West SC. Processing of recombination intermediates by the RuvABC proteins. Ann. Rev. Genet. (1997);31:213–244.Comprehensive information on branch migration and resolution of Holliday junctions. [PubMed: 9442895]
Image ch12f13
Image ch12f15
Image ch5f15
Image ch4f7
Image ch3f20
Image ch7f7
Image ch5f8
Image ch2f20
Image ch4f18
Image ch9f24
Image ch5f14
Image ch2f26
Image ch13f12
Copyright © 2002, Garland Science.
Bookshelf ID: NBK21114


Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...