• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Strachan T, Read AP. Human Molecular Genetics. 2nd edition. New York: Wiley-Liss; 1999.

Cover of Human Molecular Genetics

Human Molecular Genetics. 2nd edition.

Show details

Chapter 9Instability of the human genome: mutation and DNA repair

9.1. An overview of mutation, polymorphism, and DNA repair

As in other genomes, the DNA of the human genome is not a static entity. Instead, it is subject to a variety of different types of heritable change (mutation). Large-scale chromosome abnormalities involve loss or gain of chromosomes or breakage and rejoining of chromatids (see Section 2.6). Smaller scale mutations can be grouped into different mutation classes and can also be categorized on the basis of whether they involve a single DNA sequence (simple mutations - Section 9.2) or whether they involve exchanges between two allelic or nonallelic sequences (Section 9.3). Three classes of small-scale mutation can be distinguished (see also Table 9.1):

Table 9.1. Incidence of mutation classes in the human genome.

Table 9.1

Incidence of mutation classes in the human genome.

  • Base substitutions - involve replacement of usually a single base; in rare cases several clustered bases may be replaced simultaneously as a result of a form of gene conversion.
  • Deletions - one or more nucleotides are eliminated from a sequence.
  • Insertions - one or more nucleotides are inserted into a sequence. In rare cases this involves transposition from another locus. Copy or duplicative transposition involves a sequence from one locus being replicated and the copy inserted into another locus. Noncopy transposition involves simple transposition of a DNA sequence from one locus to another. In human and mammalian genomes, noncopy transposition is very rare: the great majority of DNA transposition occurs via an RNA intermediate so that the insertion is of a sequence copied from another locus.

New mutations arise in single individuals, in somatic cells or in the germline. If a germline mutation does not seriously impair an individual's ability to have offspring who can transmit the mutation, it can spread to other members of a (sexual) population. Allelic sequence variation is traditionally described as a DNA polymorphism if more than one variant (allele) at a locus occurs in a human population with a frequency greater than 0.01 (a frequency high enough such that an origin as a result of chance recurrence is highly unlikely). The mean heterozygosity for human genomic DNA is thought to be of the order of 0.001–0.004 (i.e. approximately 1:250 to 1:1000 bases are different between allelic sequences; Cooper et al., 1985; Nickerson et al., 1998; Taillon-Miller et al., 1998). Certain genes, notably some HLA genes, are exceptionally polymorphic and alleles can show very substantial sequence divergence (see Figure 14.27). Because mutation rates are comparatively low the vast majority of the differences between allelic sequences within an individual are inherited, rather than resulting from de novo mutations.

Mutations are the raw fuel that drives evolution, but they can also be pathogenic (Sections 9.4 and 9.5). They can be the direct cause of a phenotypic abnormality or they can result in increased susceptibility to disease. The usually low level of mutation may therefore be viewed as a balance between permitting occasional evolutionary novelty at the expense of causing disease or death in a proportion of the members of a species. Normally, most mutations arise as copying errors during DNA replication because DNA polymerases, like all enzymes, are error-prone. The error rate of a DNA polymerase (that is, the frequency of incorporating a wrong base) is significantly reduced by having a subunit of the polymerase which has a proofreading function. Even then, however, the size of the human genome makes huge demands on the fidelity of any DNA polymerase: a sequence of 3 billion nucleotides needs to be replicated accurately every single time a human cell divides.

DNA is also subject to significant spontaneous chemical attack in the cell. For example, every day approximately 5000 adenines or guanines are lost from the DNA of each nucleated human cell by depurination (the N-glycosidic bond linking the purine residue to the carbon 1′ of the deoxyribose is hydrolyzed and the purine is replaced by a hydroxyl group at carbon 1′). DNA is also damaged by exposure to natural ionizing radiation and to reactive metabolites. In order to minimize the mutation rate, therefore, it is necessary to have effective DNA repair systems which identify and correct many abnormalities in the DNA sequence (Section 9.6). In addition, errors that arise in the mRNA sequence during gene expression are subject to RNA surveillance mechanisms which ensure removal of mRNAs which have inappropriate termination codons (Section 9.4.6).

9.2. Simple mutations

9.2.1. Mutations due to errors in DNA replication and repair are frequent

Mutations can be induced in our DNA by exposure to a variety of mutagens occurring in our external environment or to mutagens generated in the intracellular environment. In the case of radiation-induced mutation, for example, Dubrova et al. 1996 reported that the normal germline mutation rate for hypervariable minisatellite loci was doubled as a consequence of heavy exposure to the radiaoctive fallout from the Chernobyl accident. However, under normal circumstances by far the greatest source of mutations is from endogenous mutation, notably spontaneous errors in DNA replication and repair. During an average human lifetime there are an estimated 1017 cell divisions: about 2 × 1014 divisions are required to generate the approximately 1014 cells in the adult, and additional mitoses are required to permit cell renewal in the case of certain cell types, notably epithelial cells (see Cairns, 1975). As each cell division requires the incorporation of 6 × 109 new nucleotides, error-free DNA replication in an average lifetime would require a DNA replication-repair process with an accuracy great enough so that the correct nucleotide was inserted on the growing DNA strands on each of about 6 × 1026 occasions.

Such a level of DNA replication fidelity is impossible to sustain; indeed, the observed fidelity of replication of DNA polymerases is very much less than this and uncorrected replication errors occur with a frequency of about 10-9-10-11 per incorporated nucleotide (see Cooper et al., 1995). As the coding DNA of an average human gene is about 1.7 kb, coding DNA mutations will occur spontaneously with an average frequency of about 1.7 × 10-6-1.7 × 10-8 per gene per cell division. Thus, during the approximately 1016 mitoses undergone in an average human lifetime, each gene will be a locus for about 108-1010 mutations (but for any one gene, only a tiny minority of cells will carry a mutation). In many cases, a deleterious gene mutation in a somatic cell will be inconsequential: the mutation may cause lethality for that single cell, but will not have consequences for other cells. However, in some cases, the mutation may lead to an inappropriate continuation of cell division, causing cancer (see Chapter 18).

9.2.2. The frequency of individual base substitutions is nonrandom

Base substitutions are among the most common mutations and can be grouped into two classes:

  • Transitions are substitutions of a pyrimidine (C or T) by a pyrimidine, or of a purine (A or G) by a purine.
  • Transversions are substitutions of a pyrimidine by a purine or of a purine by a pyrimidine.

When one base is substituted by another, there are always two possible choices for transversion, but only one choice for a transition. For example, the base adenine can undergo two possible transversions (to cytosine or to thymine) but only one transition (to guanine; see Figure 9.1). One might, therefore, expect transversions to be twice as frequent as transitions. Because the substitution of alleles in a population takes thousands or even millions of years to complete, nucleotide substitutions cannot be observed directly. Instead, they are always inferred from pairwise comparisons of DNA molecules that share a common origin, such as orthologs in different species. When this is done, the transition rate in mammalian genomes is found to be unexpectedly higher than transversion rates. For example, Collins and Jukes (1994) compared 337 pairs of human and rodent orthologs and found that the transition rate consistently exceeded the transversion rate. The ratio was 1.4 to 1 for substitutions which did not lead to an altered amino acid, and more than 2 to 1 for those that did result in an amino acid change.

Figure 9.1. Transversions are theoretically expected to be twice as frequent as transitions.

Figure 9.1

Transversions are theoretically expected to be twice as frequent as transitions. Blue arrows, transversions; black arrows, transitions.

Transitions may be favored over transversions in coding DNA because they usually result in a more conserved polypeptide sequence (see below). In both coding and noncoding DNA the excess of transitions over transversions is at least partly due to the comparatively high frequency of C [implies] T transitions, resulting from instability of cytosine residues occurring in the CpG dinucleotide. In such dinucleotides the cytosine is often methylated at the 5′ C atom and 5-methylcytosines are susceptible to spontaneous deamination to give thymine (Section 8.4.2). Presumably as a result of this, the CpG dinucleotide is a hotspot for mutation in vertebrate genomes: its mutation rate is about 8.5 times higher than that of the average dinucleotide (see Cooper et al., 1995). Other factors favoring transitions over transversions are likely to include differential repair of mispaired bases by the sequence-dependent proofreading activities of the relevant DNA polymerases.

9.2.3. The frequency and spectrum of mutations in coding DNA differs from that in noncoding DNA

Many mutations are generated essentially randomly in the DNA of individuals. As a result, coding DNA and noncoding DNA are about equally susceptible to mutation. Clearly, however, the major consequences of mutation are largely restricted to the approximately 3% of the DNA in the human genome which is coding DNA. Mutations which occur in this component of the genome are of two types:

  • Silent (synonymous) mutations do not change the sequence of the gene product.
  • Nonsynonymous mutations result in an altered sequence in a polypeptide or functional RNA: one or more components of the sequence are altered or eliminated, or an additional sequence is inserted into the product.

Silent mutations are thought to be effectively neutral mutations (conferring no advantage or disadvantage to the organism in whose genome they arise). In contrast, nonsynonymous mutations can be grouped into three classes, depending on their effect: those having a deleterious effect; those with no effect; and those with a beneficial effect (e.g. improved gene function or gene-gene interaction). Most new nonsynonymous mutations are likely to have a deleterious effect on gene expression and so can result in disease or lethality. However, the frequency of such mutation in the population is very much reduced because of natural selection (see Box 9.1). As a result, the overall mutation rate in coding DNA is much less than that in noncoding DNA. Consequently, the coding DNA component of a specific gene and the derived amino acid sequence show a relatively high degree of evolutionary conservation, as do important regulatory sequences such as the multiple elements of promoters and enhancers, and intronic sequences immediately flanking exons.

Box Icon

Box 9.1

Mechanisms which affect the population frequency of alleles. Individuals within a population differ from each other. Much of the basis of such differences is due to inherited genetic variation. The frequency of any mutant allele in a population is dependent (more...)

Selection pressure (the constraints imposed by natural selection) reduces both the overall frequency of surviving mutations in coding DNA and the spectrum of mutations seen. For example, deletions/insertions of one or several nucleotides are frequent in noncoding DNA but are conspicuously absent from coding DNA. This is so because often such mutations will cause a shift in the translational reading frame (frameshift mutation), introducing a premature termination codon and causing loss of gene expression. Even if insertions/deletions do not cause a frameshift mutation, they can often affect gene function, for example, as a result of removing a key coding sequence. Instead, coding DNA is marked by a comparatively high frequency of nonrandom base substitution occurring at locations which lead to minimal effects on gene expression (see next section).

9.2.4. The location of base substitutions in coding DNA is nonrandom

Nucleotide substitutions occurring in noncoding DNA usually have no net effect on gene expression. Exceptions include some changes in promoter elements or some other DNA sequence that regulates gene expression, and in important intronic sequence positions, such as at splice junctions or the splice branch site (see Figure 1.15). Substitutions occurring in coding DNA sequences which specify polypeptides show a very nonrandom pattern of substitutions because of the need to conserve polypeptide sequence and biological function. In principle, base substitutions can be grouped into three classes, depending on their effect on coding potential (see Box 9.2).

Box Icon

Box 9.2

Classes of single base substitution in polypeptide-encoding DNA. On very rare occasions, a single nucleotide substitution within polypeptide-encoding DNA causes defective gene expression by activating a cryptic splice site within an exon (see Figure 9.12). (more...)

The different classes of base substitution listed in the box show differential tendencies to be located at the first, second or third base positions of codons. Because of the design of the genetic code, different degrees of degeneracy characterize different sites. Base positions in codons can be grouped into three classes:

  • Nondegenerate sites are base positions where all three possible substitutions are nonsynonymous. They include the first base position of all but eight codons, the second base position of all codons and the third base position of two codons, AUG and UGG (Figure 9.2). Taking into account the observed codon frequencies in human genes, they comprise about 65% of the base positions in human codons. The base substitution rate at nondegenerate sites is very low, consistent with a strong conservative selection pressure to avoid amino acid changes (Figure 9.3).
  • Fourfold degenerate sites are base positions in which all three possible substitutions are synonymous and are found at the third base position of several codons (Figure 9.2). They comprise about 16% of the base positions in human codons. The substitution rate at fourfold sites is very similar to that within introns and pseudogenes, consistent with the assumption that synonymous substitutions are selectively neutral (Figure 9.3).
  • Twofold degenerate sites are base positions in which one of the three possible substitutions is synonymous. They are often found at the third base positions of codons, but also at the first base position in eight codons (Figure 9.2). They comprise about 19% of the base positions in human codons. As expected, the substitution rate for twofold degenerate sites is intermediate (Figure 9.3): only one out of the three possible substitutions, a transition, maintains the same amino acid. The other two possible substitutions are transversions which, because of the way in which the genetic code has evolved, are often conservative substitutions. For example, at the third base position of the glutamate codon GAA, a transition A [implies] G is silent, while the two transversions (A [implies] C; A [implies] T) result in replacement by a closely similar amino acid, aspartate.
Figure 9.2. Codon frequencies in human genes and locations of nondegenerate, two- and fourfold degenerate sites.

Figure 9.2

Codon frequencies in human genes and locations of nondegenerate, two- and fourfold degenerate sites. Observed codon frequencies were derived from an analysis of 1490 human genes by Wada et al. 1990. Note that although eight of the 61 first base positions (more...)

Figure 9.3. The rate of nucleotide substitution varies in different gene components and gene-associated sequences.

Figure 9.3

The rate of nucleotide substitution varies in different gene components and gene-associated sequences. On the basis of the above substitution rates and the observation that an average mammalian coding DNA sequence comprises 400 codons, the coding DNA (more...)

The design of the genetic code and the degree to which one amino acid is functionally similar to another affect the relative mutabilities of individual amino acids. Certain amino acids may play key roles which cannot be substituted easily by others. For example, cysteine is often involved in disulfide bonding which can play a crucially important role in establishing the conformation of a polypeptide (see Figure 1.25). As no other amino acid has a side chain with a sulfhydryl group, there is strong selection pressure to conserve cysteine residues at many locations, and cysteine is among the least mutable of the amino acids (Collins and Jukes, 1994). In contrast, certain other amino acids such as serine and threonine have very similar side chains, and substitutions at both the first base position of codons (ACX [implies] UCX; where X = any nucleotide) and second base positions (ACPy [implies] AGPy; where Py = pyrimidine) can result in serine [implies] threonine substitutions. Presumably as a result, serine and threonine are among the most mutable of the amino acids (Collins and Jukes, 1994).

9.2.5. Protein-coding genes show enormous variation in the rate of nonsynonymous substitutions

The rate and type of substitution varies between different genes. At one extreme are proteins whose sequences are extremely highly conserved, such as ubiquitin, histones H3 and H4, calmodulin, ribosomal proteins, etc. For example, the ubiquitin proteins of humans, mouse and Drosophila show 100% sequence identity, and comparison with the yeast ubiquitin reveals 96.1% sequence identity. These genes are not especially protected from mutation, because the rate of synonymous codon substitution is typical of that for many protein-encoding genes. Instead, what distinguishes them is the extremely low rate of nonsynonymous codon substitution compared with other genes (see Table 9.2 for some examples). Presumably, ubiquitin and the other highly conserved proteins play such crucial roles that they are under huge selection pressure to conserve the sequence. At the other extreme, the fibrinopeptides are proteins which are evolving extremely rapidly and do not appear to be subject to any selective constraint. These proteins (only 20 amino acids long) are thought to be functionless - they are fragments which are generated as part of the protein fibrinogen and discarded when the protein is activated to form fibrin during blood clotting. Another extremely rapidly evolving sequence is the major sex-determining locus, SRY. This gene encodes a protein which contains a central ‘high mobility group’ domain (HMG box) of about 78 amino acids. The HMG box is central to SRY function and is well conserved, but the flanking N- and C-terminal segments are evolving extremely rapidly, which may indicate that the majority of the SRY coding sequence is not functionally significant (Whitfield et al., 1993). In between the two extremes in the rate of nonsynonymous substitution are the vast majority of polypeptide-encoding genes (see Table 9.2).

Table 9.2. Rates of synonymous and nonsynonymous substitutions in mammalian protein-coding genes.

Table 9.2

Rates of synonymous and nonsynonymous substitutions in mammalian protein-coding genes.

9.2.6. The molecular clock can vary from gene to gene, and is different in different lineages

Synonymous substitutions have been considered to be effectively neutral from the point of view of selective constraints. As a result, the concept of a constant molecular clock (whereby a given gene or gene product undergoes a constant rate of molecular evolution) was suggested over 30 years ago. Since then, however, abundant evidence has been accumulated which does not support the concept (Ayala, 1999).

When substitution rates are compared for different genes, even closely related members of a gene family, there are considerable differences. For example, the genes listed in Table 9.2 show considerable differences not only in their rates of nonsynonymous codon substitutions, but also in the rate of synonymous codon substitutions. Such differences may be governed by a number of factors:

  • Timing of DNA replication. The DNA of different genomic components is replicated at different times. Actively transcribing genes are replicated early; transcriptionally inactive DNA such as the inactivated X chromosome is replicated late. Early replicating and late replicating DNA may be subject to different intracellular concentrations of free nucleotides and of the various enzymes involved in replication and DNA repair, causing possible differences in mutation rates.
  • Differences in GC content. This parameter is not independent of the previous one because the early replicating DNA is relatively GC-rich. The present evidence suggests that any relationship between the GC content of a mammalian gene and its mutation rate is not a simple one (Sharp and Matassi, 1994).
  • Different genomes. The mitochondrial DNA in mammals and many other animals is thought to be evolving at a much higher rate than nuclear DNA (see Section 9.4.3).

For a given gene, the molecular clock varies very considerably depending on the species lineage, and the clock runs at different rates for closely related members of a gene family (e.g. Gibbs et al., 1998). In order to estimate the relative rates of nucleotide substitutions in two lineages leading to present-day species A and B, a relative rate test is used. This involves using a third reference species C which is known to have branched off earlier in evolution, before the A-B split. Pairwise comparisons of orthologs in A and C, and in B and C are then used to calculate the K value, the number of synonymous substitutions per 100 sites. The KAC and KBC values then provide a measure of the relative rates of mutation in the lineages leading to species A and to species B. For example, when a variety of orthologs in mouse (species A) and rat (species B) are referenced against orthologs in humans (species C), the overall KAC and KBC values are nearly identical (Li and Graur, 1991, p. 82). This suggests that the base substitution rates in the lineages leading to present-day mouse and rat have been nearly equal. However, similar analyses suggest that the substitution rate appears to be lower in lineages leading to the primates and lower still in the lineage leading to modern day humans (Table 9.3).

Table 9.3. Rates of synonymous substitution per site per year in primates and rodents.

Table 9.3

Rates of synonymous substitution per site per year in primates and rodents.

The data in Table 9.3 may suggest that molecular evolution has effectively slowed down for organisms which have long generation times. With hindsight, perhaps this is not so surprising - most mutations arise when DNA is being replicated in gametogenesis (especially in males; see next section). Rodents and monkeys have comparatively shorter generation times than humans, and so will go through more generations per unit time. In addition, it has been suggested that longer-lived animals have a greater ability to repair their DNA than do short-lived species, thereby resulting in lower mutation rates (Britten, 1986).

9.2.7. Higher mutation rates in males are likely to be related to the greater number of germ cell divisions

Since Haldane first observed that most mutations resulting in hemophilia were generated in the male germline, it has been assumed that, at least in humans, mutations are preferentially paternally inherited. Two major approaches have been taken to estimate the relative mutation rates in the male and female germlines:

  • Molecular evolutionary methods. The starting point for such methods typically investigates known homologous genes on the X and Y chromosomes located outside the pseudoautosomal region. The sequences are compared with orthologs in another species to estimate the rate of synonymous mutations for the X chromosome gene (KSX) and the Y chromosome gene (KSY). Unlike autosomes, the sex chromosomes spend different amounts of time in the two sexes. The great majority of Y chromosome sequences (those outside the pseudoautosomal region - see Section 14.3.1) spend all their time in males. By contrast, as females have two X chromosomes whereas males have one X chromosome, X chromosome sequences spend on average 2/3 of their time in females and 1/3 of their time in males. If we use the symbol α to represent the ratio of the male mutation rate to the female mutation rate this is equivalent to setting the male mutation rate at α relative to a female mutation rate of 1. For most Y chromosome sequences, the mutation rate is therefore α. For X chromosome sequences it is 2/3 times the mutation rate in female cells (1) plus 1/3 times the mutation rate in males (α), giving a total of 2/3 + α/3. Therefore the observed KSX / KSY ratio = (2/3 + α/3)/α and so can be used to estimate α. This results in estimates of 3–10 for α (see Table 9.4 and Shimmin et al., 1993 for the example of the ZFX and ZFY genes).
  • Direct observation of disease-causing mutations. Clearly, the mutations assessed in this case are a special subset. The approach involves analysing samples from a patient with a de novo mutation and the two parents (the parent who passed on the faulty chromosome is typically identified by typing for markers closely flanking the disease gene). In most cases one would expect the mutation to have been transmitted through the germline (although if only a blood sample is typed, it is possible that some mutations could be postzygotic). In the case of point mutation analyses for disorders where imprinting is not suspected, estimates for α again support a higher mutation rate in males (Table 9.4).
Table 9.4. Examples of sex differences in mutation rate (see Hurst and Ellegren, 1998 and Shimmin et al., 1993).

Table 9.4

Examples of sex differences in mutation rate (see Hurst and Ellegren, 1998 and Shimmin et al., 1993).

The comparatively high male mutation rate may be due to different factors (Hurst and Ellegren, 1998), but a major contributory factor is thought to be the large sex difference in the number of human germ cell divisions. In females, the number of cell divisions from zygote to fertilized oocyte is constant because all of the oocytes have been formed by the fifth month of development and only two further cell divisions are required to produce the zygote (Figure 9.4A). The estimated number of successive female cell divisions from zygote to mature egg has been variously estimated as 24 (Vogel and Motulsky, 1996) and 31 (Li, 1997, p. 229) and is broadly similar to estimates of 30–31 male cell divisions required from zygote to stem spermatogonia at puberty. Five subsequent cell divisions are required for spermatogenesis but thereafter the spermatogenesis cycle occurs approximately every 16 days or 23 cycles per year (Figure 9.4B). This means that in males, the number of cell divisions required to produce sperm is age-dependent. If an average age of 13 is taken for onset of puberty and an average of 25 for male reproductive age, the total number of cell divisions is about 30 + 5 + [23 × (25 - 13)], or about 310 divisions (Figure 9.4). Given that errors in DNA replication/repair provide the great majority of mutations, one might then expect that the male mutation rate would be substantially greater than that of the female.

Figure 9.4. The number of cell divisions that are required to produce a human sperm cell is much greater than the number required to produce an egg cell.

Figure 9.4

The number of cell divisions that are required to produce a human sperm cell is much greater than the number required to produce an egg cell. (A) Human oogenesis occurs only during fetal life and ceases by the time of birth. The total population of germ (more...)

9.3. Genetic mechanisms which result in sequence exchanges between repeats

In addition to very frequent simple mutations, there are several mutation classes which involve sequence exchange between allelic or nonallelic sequences, often involving repeated sequences. For example, tandemly repetitive DNA is prone to deletion/insertion polymorphism whereby different alleles vary in the number of integral copies of the tandem repeat. Such variable number of tandem repeat (VNTR) polymorphisms can occur in the case of repeated units that are very short (microsatellites); intermediate (minisatellites) or large. Different genetic mechanisms can account for VNTR polymorphism depending on the size of the repeating unit (see the following two sections). In addition, interspersed repeats can also predispose to deletions/duplications by a variety of different genetic mechanisms. These are discussed particularly in the context of disease mutations and are therefore presented in Section 9.4.

9.3.1. Slipped strand mispairing can cause VNTR polymorphism at short tandem repeats (microsatellites)

There is considerable variation in the germline mutation rates at microsatellite loci, ranging from an undetectable level up to about 8 × 10-3 (Mahtani and Willard, 1993; Weber and Wong, 1993). Novel length alleles at (CA)/ (TG) microsatellites and at tetranucleotide marker loci are known to be formed without exchange of flanking markers. This means that they are not generated by unequal crossover (see below). Instead, as new mutant alleles have been observed to differ by a single repeat unit from the originating parental allele (Mahtani and Willard, 1993), the most likely mechanism to explain length variation is a form of exchange of sequence information which commences by slipped strand mispairing. This occurs when the normal pairing between the two complementary strands of a double helix is altered by staggering of the repeats on the two strands, leading to incorrect pairing of repeats. Although slipped strand mispairing can be envisaged to occur in nonreplicating DNA, replicating DNA may offer more opportunity for slippage and hence the mechanism is often also called replication slippage or polymerase slippage (see Figure 9.5). In addition to mispairing between tandem repeats, slippage replication has been envisaged to generate large deletions and duplications by mispairing between noncontiguous repeats and has been suggested to be a major mechanism for DNA sequence and genome evolution (Levinson and Gutman, 1987; see also Dover, 1995). The pathogenic potential of short tandem repeats is considerable (Sections 9.5.1 and 9.5.2).

Figure 9.5. Slipped strand mispairing during DNA replication can cause insertions or deletions.

Figure 9.5

Slipped strand mispairing during DNA replication can cause insertions or deletions. Short tandem repeats are thought to be particularly prone to slipped strand mispairing, i.e. mispairing of the complementary DNA strands of a single DNA double helix. (more...)

9.3.2. Large units of tandemly repeated DNA are prone to insertion/deletion as a result of unequal crossover or unequal sister chromatid exchanges

Homologous recombination describes recombination (crossover) occurring at meiosis or, rarely, mitosis between identical or very similar DNA sequences. It usually involves breakage of nonsister chromatids of a pair of homologs and rejoining of the fragments to generate new recombinant strands. Sister chromatid exchange is an analogous type of sequence exchange involving breakage of individual sister chromatids and rejoining fragments that initially were on different chromatids of the same chromosome. Both homologous recombination and sister chromatid exchange normally involve equal exchanges - cleavage and rejoining of the chromatids occurs at the same position on each chromatid. As a result, the exchanges occur between allelic sequences and at corresponding positions within alleles. In the case of intragenic equal crossover between two alleles, a new allele can result which is a fusion gene (or hybrid gene), comprising a terminal fragment from one allele and the remaining sequence of the second allele (Figure 9.6). However, equal sister chromatid exchanges cannot normally produce genetic variation because sister chromatids have identical DNA sequences.

Figure 9.6. Homologous equal crossover can result in fusion genes.

Figure 9.6

Homologous equal crossover can result in fusion genes. The example shows how intragenic equal crossover occurring between alleles on nonsister chromatids can generate novel fusion genes composed of adjacent segments from the two alleles. Note that similar (more...)

Unequal crossover is a form of recombination in which the crossover takes place between nonallelic sequences on nonsister chromatids of a pair of homologs (Figure 9.7). Often the sequences at which crossover takes place show very considerable sequence homology which presumably stabilizes mispairing of the chromosomes. Because crossover occurs between mispaired nonsister chromatids, the exchange results in a deletion on one of the participating chromatids and an insertion on the other. The analogous exchange between sister chromatids is called unequal sister chromatid exchange (see Figure 9.7). Both mechanisms occur predominantly at locations where the tandemly repeated units are moderate to large in size. In such cases, the very high degree of sequence homology between the different repeats can facilitate pairing of nonallelic repeats on nonsister chromatids or sister chromatids. If chromosome breakage and rejoining occurs while the chromatids are mispaired in this way, an insertion or deletion of an integral number of repeat units will result. Note that such exchanges are reciprocal; both participating chromatids are modified, in one case resulting in an insertion, and in the other case in a complementary deletion.

Figure 9.7. Unequal crossover and unequal sister chromatid exchange cause insertions and deletions.

Figure 9.7

Unequal crossover and unequal sister chromatid exchange cause insertions and deletions. The examples illustrate unequal pairing of chromatids within a tandemly repeated array. Unequal crossover involves unequal pairing of nonsister chromatids followed (more...)

Unequal sister chromatid exchange is thought to be a major mechanism underlying VNTR polymorphism in the rDNA clusters. Unequal crossover is also expected to occur comparatively frequently in complex satellite DNA repeats and at tandemly repeated gene loci. In the latter case, unequal crossover is known to generate pathogenic deletions at some loci (see Section 9.5.3). Such exchanges can also lead to concerted evolution by causing a particular variant to spread through an array of tandem repeats, resulting in homogenization of the repeat units (see Figure 9.8).

Figure 9.8. Unequal crossover in a tandem repeat array can result in sequence homogenization.

Figure 9.8

Unequal crossover in a tandem repeat array can result in sequence homogenization. Note that the initial spread of the novel sequence variant to the same position in the chromosomes of other members of a sexual population can result by random genetic drift (more...)

Occasionally, unequal crossover and unequal sister chromatid exchanges can occur at regions where there is little homology. This is likely to be the case when such mechanisms first generate a tandemly duplicated locus following mispairing of nonallelic repeats such as two Alu repeats or even smaller elements (Figure 9.9).

Figure 9.9. Tandem gene duplication can result from unequal crossover or unequal sister chromatid exchange, facilitated by short interspersed repeats.

Figure 9.9

Tandem gene duplication can result from unequal crossover or unequal sister chromatid exchange, facilitated by short interspersed repeats. The double arrow indicates the extent of the tandem gene duplication of a segment containing gene A and flanking (more...)

9.3.3. Gene conversion events may be relatively frequent in tandemly repetitive DNA

Gene conversion describes a nonreciprocal transfer of sequence information between a pair of nonallelic DNA sequences (interlocus gene conversion) or allelic sequences (interallelic gene conversion). One of the pair of interacting sequences, the donor, remains unchanged. The other DNA sequence, the acceptor, is changed by having some or all of its sequence replaced by a sequence copied from the donor sequence (Figure 9.10). The sequence exchange is therefore a directional one; the acceptor sequence is modified by the donor sequence, but not the other way round.

Figure 9.10. Gene conversion involves a nonreciprocal sequence exchange between allelic or nonallelic genes.

Figure 9.10

Gene conversion involves a nonreciprocal sequence exchange between allelic or nonallelic genes. (A) Interallelic gene conversion. Note the nonreciprocal nature of the sequence exchange - the donor sequence is not altered but the acceptor sequence is altered (more...)

One possible mechanism for gene conversion envisages formation of a heteroduplex between a DNA strand from the donor gene and a complementary strand from the acceptor gene. Following heteroduplex formation, conversion of an acceptor gene segment may occur by mismatch repair - DNA repair enzymes recognize that the two strands of the heteroduplex are not perfectly matched and ‘correct’ the DNA sequence of the acceptor strand to make it perfectly complementary in the converted region to the sequence of the donor gene strand (see Figure 9.10C).

Gene conversion has been well-described in fungi where all four products of meiosis can be recovered and studied (tetrad analysis). In humans and mammals it is not possible to do this and so gene conversion cannot be demonstrated unambiguously in higher organisms (it can never be distinguished from double crossover events, for example, although double crossovers occurring in very close proximity would normally be expected to be extremely unlikely). Despite the difficulty in identifying gene conversion in complex organisms, there are numerous instances in mammalian genomes where an allele at one locus shows a pattern of mutations which strongly resembles those found in alleles at another locus of the same species. Such evidence suggests gene conversion-like exchanges between loci.

Although simple comparisons of two sequences may be suggestive, the evidence for gene conversion is most compelling when a new mutant allele can be compared directly with its progenitor sequence. Certain highly mutable loci lend themselves to this type of analysis. In particular, some hypervariable minisatellite loci have high germline mutation rates (often 1% or more per gamete) and individual repeats often show nucleotide differences so that repeat subclasses can be recognized. Germline mutations can be studied by detecting and characterizing mutant mini-satellite alleles in individual gametes. To do this, PCR analysis has been conducted on multiple dilute aliquots of DNA isolated from the sperm of an individual (small pool PCR), where each aliquot is calibrated to contain a few, perhaps 100, input molecules (Jeffreys et al., 1994). The PCR products recovered from individual pools can then be typed to identify any new mutations that result in a novel allele whose length is sufficiently different as to be distinguishable from the progenitor allele. Analyses of the patterns of germline mutation at three such loci have failed to identify exchanges of flanking markers and have shown that most mutations occurring at these loci are polar, involving the preferential gain of a few repeats at one end of a tandem repeat array. There is a bias towards gain of repeats and evidence was obtained for nonreciprocal sequence exchange between alleles, suggesting interallelic gene conversion (Jeffreys et al., 1994). Evidence for interlocus gene conversion has also been obtained in human genes, notably the steroid 21-hydroxylase gene (see Section 9.5.3).

9.4. Pathogenic mutations

9.4.1. There is a high deleterious mutation rate in hominids

Neutral mutations (those which are neither detrimental nor advantageous for the organism carrying them) accumulate throughout the generations at a rate equal to the mutation rate. To get an estimate of the (total) mutation rate is therefore simple: all one needs to do is to measure the rate of change of some presumed neutral sequence (e.g. intronic, pseudogene, etc). The deleterious mutation rate, by contrast, has been notoriously difficult to measure and no convincing estimate existed for any vertebrate until a study reported by Eyre-Walker and Keightley (1999). They investigated amino acid changes in 46 proteins occurring in the human ancestral line after its divergence from the chimpanzee. If all non-synonymous substitutions were neutral, 231 new substitutions would have been expected in their sample of genes (given an average neutral mutation rate of 0.0056 nonsynonymous substitutions per nucleotide and a total of 41 471 nucleotides investigated). Instead, only 143 nonsynonymous substitutions were observed and 88 such substitutions were inferred to have been removed by natural selection because they had been deleterious. On the assumption of 60 000 genes, and 240 000 generations since human-chimpanzee divergence, they estimated a deleterious rate of 1.6 mutations per person per generation out of a total of 4.2 mutations per person generation.

The estimated deleterious mutation rates in chimpanzees and gorillas were very similar but those for rodent-specific lineages are about one order of magnitude less, possibly because of much smaller numbers of germ cell divisions in rodents. The very high deleterious rate in hominids may even be an underestimate. If the total gene number were 80 000 and an average coding sequence was 1800 nucleotides, the estimated deleterious mutation rate would be 2.5 mutations per person per generation and, on other grounds too, a more likely rate has been considered to be three deleterious mutations per person per generation (Crow, 1999).

So with three genetic deaths per person why are we not extinct? The data would suggest that harmful mutations need to be weeded out in clusters at a time. One way to achieve this would be if natural selection operated such that individuals with the most mutations are preferentially eliminated (e.g. harmful mutations interact). This could only happen in a sexual species where mutations are shuffled each generation by genetic recombination, and so the existence of such a high deleterious mutation rate has been taken as further vindication that sex (meiotic recombination) is an efficient way to eliminate harmful mutations.

9.4.2. Pathogenic mutations are preferentially located at certain types of intragenic DNA sequence

Pathogenic mutations can occur at three types of DNA sequence at a gene locus.

  • The coding sequence of the gene. This is where the great majority of recorded pathogenic mutations have been identified. Those due to nucleotide substitution are, in the vast majority of cases, nonsynonymous substitutions and mostly occur at first and second base positions of codons. However, very rarely, a synonymous codon substitution is not neutral as expected, but may cause disease by activating a cryptic splice site (Section 9.4.5). Because of its relatively high mutability, the CpG dinucleotide is often located at hotspots for pathogenic mutation in coding DNA (Cooper and Youssoufian, 1988). Other hotspots include tandem repeats within coding DNA (see below).
  • Intragenic noncoding sequences. This is restricted to sequences which are necessary for correct expression of the gene, such as important intronic elements, notably the highly conserved GT and AG dinucleotides at the ends of introns, but also conserved elements of the untranslated sequences. Often such mutations represent a small component (~10–15%) of the total pathogenic mutations at a gene locus (Cooper et al., 1995). However, in some disorders pathogenic splicing mutations may be common. In the case of the collagen disorder osteogenesis imperfecta they constitute a very common pathological mutation which is second in frequency only to substitutions leading to replacement of the highly conserved, structurally important glycine residues. The collagen genes have small exons and a comparatively large number of introns (often more than 50 and as many as 106 in the case of the COL7A1 gene), making them exceptional targets for splicing mutations. Occasional pathogenic mutations have been recorded in the 5′ UTR (such as in the case of hemophilia B Leyden) and appear to exert their effect at the transcriptional level. Several examples are also known of pathogenic mutations in the 3′ UTR (see Cooper et al., 1995).
  • Regulatory sequences outside exons. Most mutations located in regulatory sequences have been identified in conserved elements located just upstream of the first exon, notably promoter elements. In addition, other more distantly located regulatory elements may be sites of pathological mutation. For example, deletions which eliminate the β-globin LCR (see Figure 8.23) but leave the β-globin gene and its promoter intact result in almost complete abolition of β-globin gene expression and contribute to β-thalassemia. Clearly, in some cases a gene may be regulated by the product of a distantly related gene. For example, in the case of rare variants of α-thalassemia with mental retardation, the α-globin gene and its promoter may show no evidence of pathological mutation and the disease maps to an X-linked gene which encodes a transcription factor, one of whose target sequences is presumably the α-globin gene (Gibbons et al., 1995).

9.4.3. The mitochondrial genome is a hotspot for pathogenic mutations

Because of the very large size of the human nuclear genome, most mutations occur in nuclear DNA sequences. By comparison, the mitochondrial genome is a small target for mutation (about 1/200 000 of the size of the nuclear genome). Unlike nuclear genes, mitochondrial genes are present in numerous copies (there are thousands of copies of the mtDNA molecule in each human somatic cell; some cells, such as brain and muscle cells, have particularly high oxidative phosphorylation requirements and so more mitochondria). The mtDNA is inherited from the maternal oocyte, which is an exceptional cell with many more mtDNA molecules than somatic cells. Given that a mutation in mitochondrial DNA must arise on a single mtDNA molecule, one might intuitively expect that the chances of a single mtDNA mutation becoming fixed would be very low and the mutation rate correspondingly low. On these grounds, one could anticipate that the proportion of clinical disease due to pathogenic mutation in the mitochondrial genome should be extremely low. Instead, the frequency of ‘mitochondrial disorders’ is rather high (Section 16.6.4) and the mitochondrial genome can be considered to be a mutation hotspot. Different factors can explain this apparent paradox:

  • Differential target size for pathogenic mutation. Pathogenesis is associated with mutations in coding DNA and the mitochondrial genome has a much higher percentage of coding DNA (93%) than found in the nuclear genome (3%). When this is taken into consideration, however, there is still a large imbalance: about 100 Mb of coding DNA in the nuclear genome but only 15.4 kb of coding sequence in the mitochondrial genome, giving a target ratio of 6000:1 in favor of the nuclear genome.
  • High mutation rate in mtDNA. The mitochondrial genome is much more prone to nucleotide change than the nuclear genome. Even although about 100 000 copies of the mitochondrial genome are maternally inherited in the fertilized oocyte there are mechanisms which permit rapid fixation of mutations in mitochondrial DNA (Box 9.3). The combination of mtDNA instability and a high fixation rate means that the mutation rate in mitochondrial DNA is very high. Mutations have been reported to be fixed in the mitochondrial genomes of animal cells at a rate which is about 10 times greater than occurring in equivalent sequences in the nuclear genome (Brown et al., 1979). This means that the small recombination-deficient animal mtDNA molecules appear to be evolving remarkably rapidly, corresponding to about 2–4% sequence divergence per million years. In contrast, plant mtDNA molecules are comparatively large (150 kb-2.5 Mb), have introns, engage in recombination and are evolving comparatively slowly.
Box Icon

Box 9.3

How are new mitochondrial mutations fixed (i.e. achieve a frequency of 100% in a population)?. We inherit 23 nuclear DNA molecules (chromosomes) from each parent but perhaps as many as 100 000 mtDNA molecules in the maternal oocyte (Piko and Taylor, 1987). (more...)

Segregation of mitochondrial genotypes in the maternal germ line.


Segregation of mitochondrial genotypes in the maternal germ line. Two subpopulations of mtDNA are shown (dark grey and blue). Primordial germ cells do not differ substantially in their levels of heteroplasmy, but a dramatic segregation of mtDNA genotypes (more...)

The high instability of mtDNA has been postulated to result from several factors. The high rate of production of reactive oxygen intermediates by the respiratory chain is thought to cause substantial oxidative damage to mtDNA which, unlike nuclear DNA, is not protected by histones. The mtDNA also has to undergo many more rounds of replication than chromosomal DNA. Although several well-characterized mtDNA repair systems are now known, some frequent mutations cannot be repaired, including thymidine dimers (Section 9.6).

9.4.4. Many different factors govern the expression of pathogenic mutations

The degree to which a pathogenic mutation results in an aberrant phenotype depends on several factors:

  • The mutation class and the way in which the expression of the mutant gene is altered. This may depend on the location of the mutation within the gene (Table 9.5). Many pathogenic mutations result in abolition or substantial reduction of gene expression, but some lead to inappropriate expression. For example, overexpression of a gene product may cause an abnormal phenotype where gene dosage needs to be carefully regulated, and ectopic expression, that is expression in tissues where the gene is not normally expressed, may also be harmful.
  • The degree to which aspects of the aberrant phenotype are expressed in the heterozygote. The presence of a single normal allele may be sufficient to maintain a clinically normal phenotype (as in recessively inherited disorders), or a milder phenotype when compared with that of mutant homozygotes, as in dominantly inherited disorders where the mutation is a simple loss of function mutation.
  • The degree to which expression of a mutant phenotype is influenced by other gene products. The same mutant allele can have different phenotypic effects on different genetic backgrounds, depending on particular alleles at other gene loci (modifier genes).
  • The proportion and nature of cells in which the mutant gene is present. Generally, mutations which are present in all the cells of an individual (inherited mutations) or in many of them (somatic mutations acquired very early in development) are likely to have a more profound effect than those present in a few cells (somatic mutations which arise at much later stages) or in cell types where the relevant gene is not expressed. Cancers, however, arise from unregulated division of cells produced from a single original mutant cell.
  • The parental origin of the mutation. This is only known to be important in the case of the few genes which are imprinted (see Section 8.5.4 and Box 16.6).
Table 9.5. Effect of location and class of mutation on gene function.

Table 9.5

Effect of location and class of mutation on gene function.

9.4.5. Most splicing mutations alter a conserved sequence required for normal splicing, but some occur in sequences not normally required for splicing

Many genes naturally undergo alternative forms of RNA splicing. In addition, mutations can sometimes produce an aberrant form of RNA splicing which is pathogenic. Sometimes this results in the sequences of whole exons being excluded from the mature RNA (exon skipping) or retention of whole introns. On other occasions, the abnormal splicing pattern may exclude part of a normal exon or result in new exonic sequences. Point mutations which alter a conserved sequence that is normally required for RNA splicing are comparatively common. Occasionally, however, aberrant splicing of a gene can be induced by mutation of other sequence elements which resemble splice donor or splice acceptor sequences but which are not normally involved in splicing (cryptic splice sites).

Mutations which alter important splice site signals

Often such mutations occur at the essentially invariant GT and AG dinucleotides located respectively at the start of an intron (splice donor) or at its end (splice acceptor). Flanking these important signals, however, are other conserved sequence elements (see Figure 1.15) which, if mutated, can also cause aberrant splicing. Mutations which alter such sequences can have different consequences:

  • Failure of splicing causing intron retention. This can occasionally result, for example, when an intron is small and the neighboring sequence lacks alternative legitimate splice sites or cryptic splice sites (sequences which resemble the consensus splice site sequences but which are not normally used by the splicing apparatus; see Figure 9.11A). The introduction of intronic sequence into the coding sequence of a mature mRNA will, at the very least, introduce additional amino acids and may cause a frameshift.
  • Exon skipping. The splicing apparatus uses an alternative legitimate splice site. Mutation of a splice donor sequence results in skipping of the upstream exon while mutation of the splice acceptor sequence results in skipping of the downstream exon (Figure 9.11A). Often, the exclusion of an exon has a profound effect on gene expression: it may result in a frameshift, an unstable RNA transcript, or a nonfunctional polypeptide because of a loss of a critical group of amino acids.
Figure 9.11. Splicing mutations can arise by alteration of conserved splice donor and splice acceptor sequences or by activation of cryptic splice sites.

Figure 9.11

Splicing mutations can arise by alteration of conserved splice donor and splice acceptor sequences or by activation of cryptic splice sites. (A) Mutations at conserved splice donor (SD) or splice acceptor (SA) sequences (see Figure 1.15 for consensus (more...)

In addition to the above, the branch site used in splicing (see Figure 1.15) may be mutated leading to defective splicing.

Mutations of sequences which are not normally important for RNA splicing

Cryptic splice sites coincidentally resemble the sequences of authentic splice sites but are not normally used in splicing, unless a mutation alters the sequence so that the splicing apparatus now recognizes it as a normal splice site. Because individual splice donor and splice acceptor sequences often show some variation from the consensus sequences shown in Figure 1.15, cryptic splice sites may not be difficult to find (e.g. the β-globin gene has quite a variety of cryptic splice sites; see Cooper et al., 1995). The use of an intronic cryptic splice site will introduce new amino acids, while using an exonic cryptic splice site will result in a deletion of coding DNA (Figure 9.11B).

See Figures 9.12 and 9.13 respectively for worked examples of activation of a cryptic splice donor within an exon, and a cryptic splice acceptor within an intron. The former is a cautionary reminder that apparently silent mutations may yet be pathogenic. Note that in some cases mutations which occur within exons but not at cryptic splice sites can also induce skipping of that exon (see next section).

Figure 9.12. When a silent mutation is not silent.

Figure 9.12

When a silent mutation is not silent. This example shows a mutation that was identified in a LGMD2A limb girdle muscular dystrophy patient. The mutation was found in the calpain 3 gene, a known locus for this form of muscular dystrophy, but occurred at (more...)

Figure 9.13. Mutations can cause abnormal RNA splicing by activation of cryptic splice sites.

Figure 9.13

Mutations can cause abnormal RNA splicing by activation of cryptic splice sites. This figure illustrates activation of a cryptic splice acceptor sequence located within an intron (compare Figure 9.12 which illustrates activation of a cryptic splice donor (more...)

9.4.6. Mutations that introduce a premature termination codon usually result in unstable mRNA but other outcomes are possible

Several different classes of mutation can introduce a premature termination codon (chain terminating mutations). Nonsense mutations produce a premature termination codon simply by substituting a normal codon with a stop codon. Frameshifting insertions and deletions usually also introduce a premature termination codon not too far downstream of the mutation site. This happens because there is no selection pressure to avoid stop codons in the other translational reading frames and so given established nucleotide frequencies, at least one stop codon is usually encountered within a stretch of 100 nucleotides downstream of the mutation site. A variety of splice site mutations too can introduce a premature termination codon e.g. by skipping of a single exon containing a number of nucleotides that cannot be divided by 3. There are several possible consequences for gene expression for chain-terminating mutations:

  • Unstable mRNA. This is by far the most frequent consequence. A mRNA carrying a premature codon is usually rapidly degraded in vivo by a form of RNA surveillance known as nonsense-mediated mRNA decay (Hentze and Kulozik, 1999; Culbertson, 1999). This can avoid the potentially lethal consequences of producing a truncated polypeptide which could interfere with vital cell functions.
  • Truncated polypeptide. The production of a polypeptide truncated at the C terminus is a very rare outcome in vivo (the well-known protein truncation test which assays for mutations introducing a premature termination codon is carried out using an in vitro transcription-translation system). Nevertheless, some truncated polypeptides are produced in vivo (see, for example, Lehrman et al., 1987). The effect on gene expression may be difficult to predict and will depend among other things on the extent of the truncation, the stability of the polypeptide product and its ability to interfere with expression of normal alleles.
  • Exon skipping. Some nonsense mutations appear to induce skipping of constitutive exons in vivo. For example, a nonsense mutation in the middle of exon 51 of the FBN1 fibrillin gene (corresponding to the C terminus of the protein) causes that exon to be skipped (Dietz et al., 1993). As a result of exon skipping the abnormally spliced mRNA uses the normal stop codon and escapes nonsense-mediated mRNA decay unlike any full length mRNA which may be produced from the pre-mRNA. The abnormally spliced FBN1 mRNA accumulates and is translated to give a dominant negative protein lacking C-terminal sequences.

9.5. The pathogenic potential of repeated sequences

The human genome, like other mammalian genomes, has a very high proportion of DNA sequences that are repeated. Tandem repeats in coding DNA include very short nucleotide repeats, moderately sized repeats and very large repeats that can include whole genes. Depending on the degree of sequence homology between the repeats, tandem repeats are liable to a variety of different genetic mechanisms causing sequence exchange between the repeats (Table 9.6). Often such sequence exchanges result in changes in the number of tandem repeats. A reduction in repeat number can often result in a pathogenic deletion, but expansion by sequence duplication can be pathogenic too (Mazzarella and Schlessinger, 1998). Certain chromosomal regions, notably the subtelomeric and pericentromeric regions, harbor large tracts of duplicated DNA and instability of such regions can predispose to disease (Eichler, 1998). Interspersed repeats can also cause pathogenic mutations by a different variety of mechanisms (see Table 9.6).

Table 9.6. Repeated DNA sequences often contribute to pathogenesis.

Table 9.6

Repeated DNA sequences often contribute to pathogenesis.

9.5.1. Slipped strand mispairing of short tandem repeats predisposes to pathogenic deletions and frameshifting insertions

Insertions and deletions in coding DNA are rare because they usually introduce a translational frameshift. However, occasionally, a series of tandem repeats of a small number of nucleotides occurs by chance in the coding sequence for a polypeptide. Such repeats, like classical microsatellite loci, are comparatively prone to mutation by slipped strand mispairing. As a result, the copy number of tandem repeats is liable to fluctuate, introducing a deletion or an insertion of one or more repeat units. If the mutation occurs in polypeptide-encoding DNA, a resulting deletion will often have a profound effect on gene expression. Frameshifting deletions will normally result in abolition of gene expression. Even if the deletion does not produce a frameshift, deletions of one or more amino acids can still be pathogenic (Figure 9.14). Small frameshifting insertions will also be expected to lead to loss of gene expression and often the insertion is a tandem repeat of sequences flanking it. However, nonframeshifting insertions would often not be expected to be pathogenic, unless the insertion occurs in a critically important region, destabilizing an essential structure or impeding gene function in some way. Note that large triplet repeat expansions can lead to disease by mechanisms that are not understood at present (see next section).

Figure 9.14. Short tandem repeats are deletion/insertion hotspots.

Figure 9.14

Short tandem repeats are deletion/insertion hotspots. The six deletions illustrated are examples of pathogenic deletions occurring at tandemly repeated units of from 1 to 6 bp and have probably arisen as a result of replication slippage (Figure 9.5). (more...)

9.5.2. Rapid large-scale expansion of intragenic triplet repeats can cause a variety of diseases but the mutational mechanism is not well understood

Sometimes microsatellites within or in the immediate vicinity of a gene can expand to considerable lengths and affect gene expression, causing disease. In some cases, a modestly expanded repeat which causes disease may be perfectly stable and be propagated without change in size through several generations. For example, triplet repeat expansion leading to long polyalanine tracts in the HOXD13 gene cause a form of synpolydactyly, probably as a result of unequal crossover (Warren, 1997), but the expanded repeat is stable (Akarsu et al., 1996). In other cases, however, the expanded triplet repeat is unstable and the discovery that human disease can be caused by large-scale expansion of highly unstable trinucleotide repeats was quite unexpected. Studies in other organisms had not revealed precedents for such a phenomenon, but the list of human examples is now considerable (see Box 16. 7). In addition to unstable triplet repeat expansion, the majority of disease alleles at the cystatin B gene which cause progressive myoclonus epilepsy involve expansions of a 12 nucleotide repeat (CCCCGCCCCGCG; Lalioti et al., 1998). The pathological mechanisms by which unstable expanded repeats cause disease are discussed in Chapter 16. Here we are concerned with the nature and mechanism of the DNA instability (see also Djian, 1998; Sinden, 1999).

Tandem trinucleotide repeats are not infrequent in the human genome. Although there are 64 possible trinucleotide sequences, when allowance is made for cyclic permutations (CAG) n = (AGC) n = (GCA) n and reading from either strand [5′(CAG) n on one strand = 5′(CTG) n on the other], there are only 10 different trinucleotide repeats (Figure 9.15). Most of these are known as usefully polymorphic microsatellite markers but, in addition, certain repeats show anomalous behavior which can cause abnormal gene expression. In each case, repeats below a certain length are stable in mitosis and meiosis while, above a certain threshold length, the repeats become extremely unstable. These unstable repeats are virtually never transmitted unchanged from parent to child. Both expansions and contractions can occur, but there is a bias towards expansion. The average size change often depends on the sex of the transmitting parent, as well as the length of the repeat. Genes containing unstable expanding trinucleotide repeats fall into two major classes (see Box 16.7):

Figure 9.15. The ten possible trinucleotide repeats.

Figure 9.15

The ten possible trinucleotide repeats. Both DNA strands are shown. All other trinculeotide repeats are cyclic permutations of one or another of these (see text).

  • Genes which show modest expansions of (CAG)n repeats within the coding sequence. Typically, the stable and nonpathological alleles have 10–30 repeats, while unstable pathological alleles have modest expansions, often in the range of 40–200 repeats. Transcription and translation of the gene are not affected by the expansion. The resulting protein product shows a gain of function: its long polyglutamine tract causes it to aggregate within certain cells and kill them.
  • Genes which show very large expansions of a noncoding repeat. For some genes, various types of triplet repeat (e.g. CGG, CCG, CTG, GAA) found in the promoter, the untranslated regions or intronic sequences can undergo very large expansions in such a way as to inhibit gene expression, causing loss of function. Typically, the stable and nonpathological alleles have 5–50 repeats, while unstable pathological alleles have several hundreds or thousands of copies (see Box 16.7).

Intergenerational changes are normally reported as parent-child comparisons of blood lymphocyte DNA. There is little information about when in gametogenesis, fertilization or embryogenesis the changes arise. Limited studies of sperm show that highly expanded DM and FRAXA (fragile-X syndrome) repeats are not transmitted by affected males, although modest expansions can be. The largest expansions in Huntington disease (which, however, are small compared with large FRAXA or DM expansions) are seen in sperm, consistent with the observation that the severest cases inherit the disease from their father. At least for FRAXA, DM and Kennedy disease, the expanded repeats are mitotically unstable, so that a blood sample shows a smear of heterogeneous expanded repeats sizes. However, in vitro, even large repeats are stable. Thus, whatever the mechanism, it is not operative in all cells.

The basis of the unstable expansions is very largely unknown and this type of mutagenic event has not been identified thus far in genetically tractable organisms such as E. coli, yeast or Drosophila. There is also evidence that the unstable expansion mechanism may not have a parallel in some other mammals such as mice. Human transgenes containing long trinucleotide repeats show virtually no instability after being propagated through several generations in transgenic mice whereas the same sequences may show a 100% probablity of expansion when transmitted in the human germline (Djian, 1998). Investigations have also suggested that arrays of triplet repeats may be able to form alternative DNA structures, such as DNA hairpins, triplex DNA and quadruplex DNA (Sinden, 1999) but their significance if formed in vivo is unknown. The repeats have also been envisaged as possible protein-binding sites and protein-binding at the RNA level has also been envisaged to contribute to pathogenesis in some cases, notably in myotonic dystrophy (Philips et al., 1998).

Slipped strand mispairing (see Figure 9.5) is likely to be a component of the expansion mechanism, given the observation that interrupted repeats appear to be stable and only homogeneous repeats are unstable. For example, in spinocerebellar ataxia type 1, 123/126 normal sized CAG repeats were interrupted by one or two CAT triplets, while 30/30 expanded alleles contained no interruption (Chung et al., 1993; see Figure 9.16). One problem with all these mispairing mechanisms is that they should result in contractions as well as expansions and this is not seen. Instead, after a certain threshold size, there appears to be a clear bias towards continued expansion of the size of the repeat unit array. Because understanding of trinucleotide repeats is progressing very rapidly at the time of writing, the reader is advised to consult a recent review for more information.

Figure 9.16. Uninterrupted triplet repeats are more prone to expansion.

Figure 9.16

Uninterrupted triplet repeats are more prone to expansion. Analysis of the SCA1 spinocerebellar ataxia gene by Chung et al. (1993) showed that all the presumed stable alleles from normal subjects had interrupted repeats except the three with the shortest (more...)

9.5.3. Tandemly repeated and clustered gene families may be prone to pathogenic unequal crossover and gene conversion-like events

Many human and mammalian gene clusters contain nonfunctional pseudogenes which may be closely related to functional gene members. Interlocus sequence exchanges between pseudogenes and functional genes can result in disease by removing or altering some or all of the sequence of a functional gene. For example, unequal crossover (or unequal sister chromatid exchange) between a functional gene and a related pseudogene can result in deletion of the functional gene or the formation of fusion genes containing a segment derived from the pseudogene. Alternatively, the pseudogene can act as a donor sequence in gene conversion events and introduce deleterious mutations into the functional gene.

The classical example of pathogenesis due to gene- pseudogene exchanges is steroid 21-hydroxylase deficiency, where over 95% of pathogenic mutations arise as a result of sequence exchanges between the functional 21-hydroxylase gene, CYP21B, and a very closely related pseudogene, CYP21A. The two genes occur on tandemly repeated DNA segments approximately 30 kb long which also contain other duplicated genes, notably the complement C4 genes, C4A and C4B (Figure 9.17). Large pathogenic deletions uniformly result in removal of about 30 kb of DNA, corresponding to one repeat unit length, and analysis of de novo 21-hydroxylase deficiency mutations has provided strong evidence for pathogenic deletions arising as a result of meiotic unequal crossover (Sinnott et al., 1990).

Figure 9.17. Almost all 21-hydroxylase gene mutations are due to sequence exchange with a closely related pseudogene.

Figure 9.17

Almost all 21-hydroxylase gene mutations are due to sequence exchange with a closely related pseudogene. The duplicated complement C4 genes and steroid 21-hydroxylase genes are located on tandem 30 kb repeats which show about 97% sequence identity. Both (more...)

Virtually all of the 75% of pathogenic point mutations are copied from deleterious mutations in the pseudogene, suggesting a gene conversion mechanism (Figures 9.17 and 9.18). Analysis of one such mutation which arose de novo suggests that the conversion tract is a maximum of 390 bp (Collier et al., 1993). Gene conversion events are also found in the duplicated C4 genes, both of which are normally expressed. A likely priming event for conversions in the CYP21-C4 gene cluster is unequal pairing of chromatids so that a CYP21A-C4A unit pairs with a CYP21B-C4B unit (Figure 9.17).

Figure 9.18. Pathogenic point mutations in the steroid 21-hydroxylase gene originate by copying sequences from the 21-hydroxylase pseudogene.

Figure 9.18

Pathogenic point mutations in the steroid 21-hydroxylase gene originate by copying sequences from the 21-hydroxylase pseudogene. The copying is thought to involve a gene conversion-like mechanism (see Figures 9.10C and 9.17).

9.5.4. Interspersed repeats often predispose to large deletions and duplications

Short direct repeats

In several cases, the endpoints of deletions are marked by very short direct repeats. For example, the breakpoints in numerous pathogenic deletions in mtDNA occur at perfect or almost perfect short direct repeats. Of these, the most common is a deletion of 4977 bp which has been found in multiple patients with Kearns-Sayre syndrome, an encephalomyopathy characterized by external ophthalmoplegia, ptosis, ataxia and cataract. The deletion results in elimination of the intervening sequence between two perfect 13 bp repeats and loss of the sequence of one of the repeats (Figure 9.19). The mitochondrial genome is recombination-deficient and Shoffner et al. 1989 have postulated that such deletions arise by a replication slippage mechanism, similar to that occurring at short tandem repeats (see Figure 9.5). Partial duplications of the mitochondrial genome are also distinctive features of certain diseases, notably Kearns-Sayre syndrome. The ends of the duplicated sequences, like those of the common deletions, are often marked by short direct repeats, and the mechanisms of duplication and deletion appear to be closely related (Poulton and Holt, 1994).

Figure 9.19. Short direct repeats mark the endpoints of many pathogenic deletions in the mitochondrial genome.

Figure 9.19

Short direct repeats mark the endpoints of many pathogenic deletions in the mitochondrial genome. Scale represents nucleotide position in the mitochondrial genome. Position 1 occurs within the D loop region and numbering increases in a clockwise direction (more...)

The Alu repeat as a recombination hotspot

Some large-scale deletions and insertions may be generated by pairing of nonallelic interspersed repeats, followed by breakage and rejoining of chromatid fragments. For example, the Alu repeat occurs approximately once every 4 kb and mispairing between such repeats has been suggested to be a frequent cause of deletions and duplications. Some large genes have many internal Alu sequences in their introns or untranslated sequences, making them liable to frequent internal deletions and duplications. For example, the 45-kb low density lipoprotein receptor gene has a relatively high density of Alu repeats (approximately one every 1.6 kb). A very high frequency of pathogenic deletions in this gene are likely to involve an Alu repeat, usually at both endpoints, and occasional pathogenic intragenic duplications also involve Alu repeats (Hobbs et al., 1990). Such observations have suggested a general role for Alu sequences in promoting recombination and recombination-like events. Initial gene duplications in the evolution of clustered multigene families may often have involved an unequal crossover event between Alu repeats or other dispersed repetitive elements. It should be noted, however, that some Alu-rich genes do not appear to be loci for frequent Alu-mediated recombination.

9.5.5. Pathogenic inversions can be produced by intrachromatid recombination between inverted repeats

Occasionally, clustered inverted repeats with a high degree of sequence identity may be located within or close to a gene. The high degree of sequence similarity between inverted repeats may predispose to pairing of the repeats by a mechanism that involves a chromatid bending back upon itself. Subsequent chromatid breakage at the mispaired repeats and rejoining can then result in an inversion, in much the same way as the natural mechanism used for the production of some immunoglobulin κ light chains (see Figure 8.28).

The classic example of pathogenic inversions is a mutation which accounts for more than 40% of cases of severe hemophilia A. Intron 22 of the factor VIII gene, F8, contains a CpG island from which two internal genes are transcribed: F8A in the opposite direction to the host gene F8, and F8B in the same direction as F8 (see Figure 9.20). F8A belongs to a gene family with two other closely related members located several hundred kilobases upstream of F8 gene and transcribed in the opposite direction to F8A. As a result, the region between the F8A gene and the other two members is susceptible to inversions - the F8A gene can pair with either of the other two members on the same chromatid, and subsequent chromatid breakage and rejoining in the region of the paired repeats results in an inversion which disrupts the factor VIII gene (Lakich et al., 1993, see Figure 9.20).

Figure 9.20. Inversions disrupting the factor VIII gene result from intrachromatid recombination between inverted repeats.

Figure 9.20

Inversions disrupting the factor VIII gene result from intrachromatid recombination between inverted repeats. For the sake of clarity only exons 22, 23 and the first and last exons (1 and 26) of the factor VIII gene (F8; open box) are shown. Intron 22 (more...)

9.5.6. DNA sequence transposition is not uncommon and can cause disease

As described in Sections 7.4.4 to 7.4.6, a proportion of moderately and highly repeated interspersed elements are capable of transposition via an RNA intermediate. Defective gene expression due to DNA transposition is comparatively rare and represents only a small component of molecular pathology. However, several examples have been recorded of genetic deficiency due to insertional inactivation by retrotransposons. For example, in one study, hemophilia A was found to arise in two out of 140 unrelated patients as a result of a de novo insertion of a LINE-1 (Kpn) repeat into an exon of the factor VIII gene (Kazazian et al., 1988). Other instances are known of insertional inactivation by an actively transposing Alu element, as in a case of neurofibromatosis type 1 (Wallace et al., 1991). Additionally, a number of other examples have been recorded of pathogenesis due to intragenic insertion of undefined DNA sequences.

9.6. DNA repair

DNA in cells suffers a wide range of damage:

  • purine bases are lost by spontaneous fission of the base-sugar link;
  • cytosines, and occasionally adenines, spontaneously deaminate to produce uracil and hypoxanthine respectively;
  • many chemicals, for example alkylating agents, form adducts with DNA bases;
  • reactive oxygen species in the cell attack purine and pyrimidine rings
  • ultraviolet light causes adjacent thymines to form a stable chemical dimer;
  • mistakes in DNA replication result in incorporation of a mismatched base;
  • ionizing radiation causes single- or double-strand breaks;
  • mistakes in replication or recombination leave strand breaks in DNA.

All these lesions must be repaired if the cell is to survive. The importance of effective DNA repair systems is highlighted by the severe diseases affecting people with deficient repair systems (see below).

9.6.1. DNA repair usually involves cutting out and resynthesizing a whole area of DNA surrounding the damage

To cope with all these forms of damage, cells must be capable of several different types of DNA repair (for reviews, see the October 1995 issue of Trends in Biochemical Sciences). DNA repair seldom involves simply undoing the change that caused the damage. Almost always a stretch of DNA containing the damaged nucleotide(s) is excised and the gap filled by resynthesis. There are at least five main types of DNA repair in human cells:

  • Direct repair reverses the DNA damage. A specific enzyme is able to dealkylate O6-alkyl guanine directly. In bacteria thymine dimers can be removed in a photoreactivation reaction that depends on visible light and an enzyme, photolyase. Mammals possess enzymes related to photolyase, but use them for a quite different purpose, to control their circadian clock (Van der Horst et al., 1999).
  • Base excision repair (BER) uses glycosidase enzymes to remove abnormal bases. An endonuclease, AP endonuclease, cuts the sugar-phosphate backbone at the position of the missing base. A few nucleotides of the DNA strand are stripped back by exonucleases, the gap is filled by resynthesis, and the remaining nick is sealed by DNA ligase III. The same process is used to repair spontaneous depurination. Interestingly, no human diseases caused by defective BER are known. Maybe any such defect would be lethal, since BER corrects much the commonest type of DNA damage.
  • Nucleotide excision repair (NER) removes thymine dimers and large chemical adducts. Figure 9.21 illustrates the process. Defects in NER cause the autosomal recessive disease xeroderma pigmentosum (Lambert et al., 1998). Seven complementation groups, XPA-XPG, have been defined by cell fusion studies. XP patients are exceedingly sensitive to UV light. Sun-exposed skin develops thousands of freckles, many of which progress to skin cancer.
  • Post-replication repair is required to correct double-strand breaks (Haber, 1999). The usual mechanism is a gene conversion-like process (recombinational repair), where a single strand from the homologous chromosome invades the damaged DNA helix. Alternatively, broken ends are rejoined regardless of their sequence, a desperate measure that is likely to cause mutations. The eukaryotic machinery for recombination repair is less well defined than the excision repair systems. Human genes involved in this pathway include NBS (mutated in Nijmegen breakage syndrome, MIM 251260, Section 12.4.1), BLM (mutated in Bloom syndrome, MIM 210900) and the BRCA2 and maybe BRCA1 breast cancer susceptibility genes (Section 19.5.3; Zhang et al., 1998).
  • Mismatch repair corrects mismatched base pairs caused by mistakes in DNA replication. Cells deficient in mismatch repair have mutation rates 100–1000 times higher than normal, with a particular tendency to replication slippage in homopolymeric runs (Figure 9.5). In humans the mechanism involves at least five proteins and defects cause hereditary nonpolyposis colon cancer (Section 18.7.1; Figure 18.17).
Figure 9.21. A possible scheme for nucleotide excision repair in humans.

Figure 9.21

A possible scheme for nucleotide excision repair in humans. (A) XPA protein recognizes damaged DNA and binds to it, directly or by binding to RPA, a single-strand binding protein. (B) The DNA-XPA-RPA complex recruits the TFIIH transcription factor. TFIIH (more...)

All these systems, except for direct repair, require exo- and endonucleases, helicases, polymerases and ligases, usually acting in multiprotein complexes that have some components in common. Sorting out the individual pathways has been greatly aided by the very strong conservation of repair mechanisms across the whole spectrum of life. Not only the reaction mechanisms but also the protein structures and even gene sequences are often conserved from E. coli to man. A downside of the conservation is a confusing gene nomenclature, referring sometimes to human diseases (XPA etc.), sometimes to yeast mutants (RAD genes) and sometimes to mammalian cell complementation systems (ERCC = excision repair cross-complementing) - for example XPD, ERCC2 and RAD3 are the same gene in man, mouse and yeast. Generally eukaryotes have multiple proteins corresponding to each single protein in E. coli, so that, for example, nucleotide excision repair requires six proteins in E. coli but at least 30 in mammals.

9.6.2. DNA repair systems share components and processes with the transcription and recombination machinery

As well as sharing components with each other, many repair systems share components with the machinery for DNA replication, transcription and recombination. DNA polymerases and ligase are required for both DNA replication and resynthesis after excision of a defect. The recombination machinery is involved in double-strand break repair. The link with transcription is particularly intriguing (Lehmann, 1995). The general transcription factor TFIIH is a multiprotein complex that includes the XPB and XPD proteins. TFIIH exists in two forms. One form is concerned with general transcription and the other with repair, probably specifically repair of transcriptionally active DNA. This system is deficient in two rare diseases, Cockayne syndrome (MIM 216400) and trichothiodystrophy (MIM 601675). Clinically and in cell biology, CS and TTD both overlap XP, and in some cases the same genes are responsible, but CS and TTD patients have developmental defects that presumably reflect defective transcription, and they do not have the cancer susceptibility of XP patients.

9.6.3. Hypersensitivity to agents that damage DNA is often the result of an impaired cellular response to DNA damage, rather than defective DNA repair

Many human diseases that involve hypersensitivity to DNA-damaging agents, or a high level of cellular DNA damage, are not caused by defects in the DNA repair systems themselves, but by a defective cellular response to DNA damage. Normal cells react to DNA damage by stalling progress through the cell cycle at a checkpoint until the damage has been repaired, or triggering apoptosis if the damage is irrepairable. Part of the machinery for doing this involves the ATM protein. The role of ATM is described in Section 18.7.3. Briefly, it senses DNA damage and relays the signal to the p53 protein, the ‘guardian of the genome’. People with no functional ATM have ataxia telangiectasia (MIM 208900; Lambert et al., 1998). Their cells are hypersensitive to radiation, and they have chromosomal instability and a high risk of malignancy, but the DNA repair machinery itself is intact. Fanconi anemia (MIM 227650) is another heterogeneous group of diseases (at least five complementation groups) marked by defective responses to DNA damage, without specific defects in DNA repair.

Further reading

  1. Cooper DN, Krawczak M (1993) Human Gene Mutation. BIOS Scientific Publishers, Oxford.
  2. Li W-H (1997) Molecular Evolution. Sinauer Associates, Inc., Sunderland, MA.


  1. Akarsu A N, Stoilov I, Yilmaz E, Sayli B S, Sarfarazi M. Genomic structure of HOXD13 gene: a nine polyalanine duplication causes synpolydactyly in two unrelated families. Hum. Mol. Genet. (1996);5:945–952. [PubMed: 8817328]
  2. Ayala F J. Molecular clock mirages. Bioessays. (1999);21:71–75. [PubMed: 10070256]
  3. Blok R B, Gook D A, Thorburn D R, Dahl H -H M. Skewed segregation of the mtDNA nt 8993 (T [implies] G mutation in human oocytes. Am. J. Hum. Genet. (1997);60:1495–1501. [PMC free article: PMC1716104] [PubMed: 9199572]
  4. Britten R J. Rate of DNA sequence evolution between different taxonomic groups. Science. (1986);231:1393–1398. [PubMed: 3082006]
  5. Brown W M, George M Jr, Wilson A C. Rapid evolution of animal mitochondrial DNA. Proc. Natl Acad. Sci. USA. (1979);76:1967–1971. [PMC free article: PMC383514] [PubMed: 109836]
  6. Cairns J. Mutation selection and the natural history of cancer. Nature. (1975);255:197–200. [PubMed: 1143315]
  7. Chung M Y, Ranum L P W, Duvick I A, Servadio A, Zoghbi H Y, Orr H T. Evidence for a mechanism predisposing to intergenerational CAG repeat instability in spinocerebellar ataxia type 1. Nature Genet. (1993);5:254–258. [PubMed: 8275090]
  8. Collier P S, Tassabehji M, Sinnott P J, Strachan T. (1993). A de novo pathological point mutation at the 21-hydroxylase locus: implications for gene conversion in the human genome Nature Genet 3260–265. [Erratum published in Nat. Genet., 4, 101.] [PubMed: 8485582]
  9. Collins D W, Jukes T H. Rates of transition and transversion in coding sequences since the human-rodent divergence. Genomics. (1994);20:386–396. [PubMed: 8034311]
  10. Cooper DN, Krawczak M (1993) Human Gene Mutation. BIOS Scientific Publishers, Oxford.
  11. Cooper D N, Youssoufian H. The CpG dinucleotide and human genetic disease. Hum. Genet. (1988);78:151–155. [PubMed: 3338800]
  12. Cooper D N, Smith B A, Cooke H J, Niemann S, Schmidtke J. An estimate of unique DNA sequence heterozygosity in the human genome. Hum. Genet. (1985);69(3):201–205. [PubMed: 2984104]
  13. Cooper DN, Krawczak M, Antonorakis SE (1995) The nature and mechanisms of human gene mutation. In: Metabolic and Molecular Bases of Inherited Disease, 7th edn. (C Scriver, AL Beaudet, WS Sly, D Valle eds), pp. 259–291. McGraw-Hill, New York.
  14. Crow J F. The odds of losing at genetic roulette. Nature. (1999);397:293–294. [PubMed: 9950420]
  15. Culbertson M R. RNA surveillance. Trends Genet. (1999);15:74–80. [PubMed: 10098411]
  16. Dietz H C, Valle D, Francomano C, Kendzioz R J Jr, Pyeritz R E. Cutting GR (1993) The skipping of constitutive exons in vivo induced by nonsense mutations. Science. 259:680–683. [PubMed: 8430317]
  17. Djian P. Evolution of simple repeats in DNA and their relation to human disease. Cell. (1998);94:155–160. [PubMed: 9695944]
  18. Dover G A. Slippery DNA runs on and on and on… Nature Genet. (1995);10:254–256. [PubMed: 7670459]
  19. Dubrova Y E, Nesterov V N, Krouchinsky N G, Ostapenko V N, Neumann R, Neil D L, Jeffreys A J. Human minisatellite mutation rate after the Chernobyl accident. Nature. (1996);380:683–686. [PubMed: 8614461]
  20. Eichler E E. Masquerading repeats: paralogous pitfalls of the human genome. Genome Res. (1998);8:758–762. [PubMed: 9724321]
  21. Eyre-Walker A, Keightley P D. High genomic deleterious mutation rates in hominids. Nature. (1999);397:344–347. [PubMed: 9950425]
  22. Gibbons R J, Picketts D J, Villard L, Higgs D R. Mutations in a putative global transcriptional regulator cause X-linked mental retardation with α-thalassemia (ATR-X syndrome). Cell. (1995);80:837–845. [PubMed: 7697714]
  23. Gibbs P E M, Witke W F, Dugaiczyk A. The molecular clock runs at different rates among closely related members of a gene family. J. Mol. Evol. (1998);46:552–561. [PubMed: 9545466]
  24. Haber J A. Gatekeepers of recombination. Nature. (1999);398:665–667. [PubMed: 10227286]
  25. Hauswirth W, Laipis P. Mitochondrial DNA polymorphism in a maternal lineage of Holstein cows. Proc. Natl Acad. Sci. USA. (1982);79:4686–4690. [PMC free article: PMC346741] [PubMed: 6289312]
  26. Hentze M W, Kulozik A E. A perfect message: RNA surveillance and nonsense-mediated decay. Cell. (1999);96:307–310. [PubMed: 10025395]
  27. Hobbs H H, Russell D W, Brown M S, Golding J L. The LDL receptor locus in familial hypercholesterolaemia: mutational analysis of a membrane protein. Annu. Rev. Genet. (1990);24:133–170. [PubMed: 2088165]
  28. Howell N. mtDNA recombination: what do in vitro data mean? Am. J. Hum. Genet. (1997);61:19–22. [PMC free article: PMC1715851] [PubMed: 9245980]
  29. Hurst L D, Ellegren H. Sex biases in the mutation rate. Trends Genet. (1998);14:446–452. [PubMed: 9825672]
  30. Jeffreys A, Tamaki K, MacLeod A, Monckton D G, Neil D L, Armour J A L. Complex gene conversion events in germline mutation at human microsatellites. Nature Genet. (1994);6:136–145. [PubMed: 8162067]
  31. Jenuth J P, Peterson A C, Fu K, Shoubridge E A. Random genetic drift in the female germline explains the rapid segregation of mammalian mitochondrial DNA. Nature Genet. (1996);14:146–150. [PubMed: 8841183]
  32. Kazazian H H, Wong C, Youssoufian H, Scott A F, Phillips D G, Antonorakis S E. Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature. (1988);332:164–166. [PubMed: 2831458]
  33. Koehler C M, Lindberg G L, Brown D R, Beitz D C, Freeman A E, Mayfield J E, Myers A M. Replacement of bovine mitochondrial DNA by a sequence variant within a single generation. Genetics. (1991);129:247–255. [PMC free article: PMC1204572] [PubMed: 1682213]
  34. Kumar S, Hedges S B. A molecular timescale for vertebrate evolution. Nat. Genet. (1998);392:917–920. [PubMed: 9582070]
  35. Lakich D, Kazazian H H Jr, Antonarakis S E, Gitschier J. Inversions disrupting the factor VIII gene are a common cause of severe haemophilia A. Nature Genet. (1993);5:236–241. [PubMed: 8275087]
  36. Lalioti M D, Scott H S, Buresi C, Rossier C, Bottani A, Morris M A, Malafosse A, Antonarakis S E. Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy. Nature. (1997);386:847–851. [PubMed: 9126745]
  37. Lambert WC, Kuo H-R, Lambert MW (1998) Xeroderma pigmentosum and related disorders. In: Jameson JL (ed.) Principles of Molecular Medicine. Humana Press, NJ.
  38. Lehmann A R. Nucleotide excision repair and the link with transcription. Trends Biochem. Sci. (1995);20:402–405. [PubMed: 8533152]
  39. Lehrman M A, Schneider W J, Brown M S, Davis C G, Elhammer A, Russell D W, Goldstein J L. The Lebanese allele of the low density lipoprotein receptor locus: nonsense mutation produces truncated receptor that is retained in the endoplasmic reticulum. J. Biol. Chem. (1987);262:401–410. [PubMed: 3025214]
  40. Levinson G, Gutman G A. Slipped strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. (1987);4:203–221. [PubMed: 3328815]
  41. Li W-H, Graur D (1991) Fundamentals of Molecular Evolution. Sinauer Associates, Inc., Sunderland, MA.
  42. Lightowlers R N, Chinnery P F, Turnbull D M, Howell N. Mammalian mitochondrial genetics: heredity, heteroplasmy and disease. Trends Genet. (1997);13:450–455. [PubMed: 9385842]
  43. Mahtani M M, Willard H F. A polymorphic X-linked tetranucleotide repeat locus displaying a high rate of new mutation: implications for mechanisms of mutation at short tandem repeat loci. Hum. Molec. Genet. (1993);2:431–437. [PubMed: 8504304]
  44. Mazzarella R, Schlessinger D. Pathological consequences of sequence duplications in the human genome. Genome Res. (1998);8:1007–1021. [PubMed: 9799789]
  45. Nickerson D A, Taylor S L, Weiss K M. et al. DNA sequence diversity in a 9 7 kb region of the human lipoprotein lipase gene. Nature Genet. (1998);19:233–240. [PubMed: 9662394]
  46. Philips A V, Timchenko L T, Cooper T A. Disruption of splicing regulated by a CUG-binding protein in myotonic dystrophy. Science. (1998);280:737–741. [PubMed: 9563950]
  47. Piko L, Taylor K D. Amounts of mitochondrial DNA and abundance of some mitochondrial gene transcripts in early mouse embryos. Dev. Biol. (1987);123:364–374. [PubMed: 2443405]
  48. Poulton J, Holt I J. Mitochondrial DNA: does more lead to less? Nature Genet. (1994);8:313–315. [PubMed: 7894476]
  49. Poulton J, Macaulay V, Marchington D R. Is the bottleneck cracked? Am. J. Hum. Genet. (1998);62:752–757. [PMC free article: PMC1377049] [PubMed: 9529369]
  50. Richard I, Beckmann J S. How neutral are synonymous codon mutations? Nature Genet. (1995);10:259. [PubMed: 7670461]
  51. Sharp P, Matassi G. Codon usage and genome evolution. Curr. Opin. Genet. Dev. (1994);4:851–860. [PubMed: 7888755]
  52. Shimmin L C, Chang B H -J, Li W -H. Male driven evolution of DNA sequences. Nature. (1993);362:745–747. [PubMed: 8469284]
  53. Shoffner J M, Lott M T, Voljavec A S, Soueidan S A, Costigan D A, Wallace D C. Spontaneous Kearns-Sayre/chronic external ophthalmoplegia plus syndrome associated with a mitochondrial DNA deletion: a slip-replication model and metabolic therapy. Proc. Natl Acad. Sci. USA. (1989);86:7952–7956. [PMC free article: PMC298190] [PubMed: 2554297]
  54. Sinden R R. Biological implications of the DNA structures associated with disease-causing triplet repeats. Am. J. Hum. Genet. (1999);64:346–353. [PMC free article: PMC1377743] [PubMed: 9973271]
  55. Sinnott P J, Collier S, Costigan C, Dyer P A, Harris R, Strachan T. Genesis by meiotic unequal crossover of a de novo deletion that contributes to 21-hydroxylase deficiency. Proc. Natl Acad. Sci. USA. (1990);87:2107–2111. [PMC free article: PMC53635] [PubMed: 2315306]
  56. Taillon-Miller P, Gu Z, Li Q, Hillier L, Kwok P -Y. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. Genome Res. (1998);8:748–754. [PMC free article: PMC310751] [PubMed: 9685323]
  57. Van der Horst G T J, Muijtjens M, Kobayashi K. et al. Mammalian Cry1 and Cry2 are essential for maintenance of circadian rhythms. Nature. (1999);398:627–630. [PubMed: 10217146]
  58. Vogel F, Motulsky AG (1996) Human Genetics. Problems and Approaches, 3rd edn. Springer-Verlag, Berlin.
  59. Wada K, Aota S, Tsuchiya R, Ishibashi F, Gojobori T, Ikemura T. Codon usage tabulated from the GenBank genetic sequence data. Nucl. Acid Res. (1990);18 (suppl.):2367–2400. [PMC free article: PMC331878] [PubMed: 2333226]
  60. Wallace M R, Andersen L B, Saulino A M, Gregory P E, Glover T W, Collins F S. A de novo Alu insertion results in neurofibromatosis type I. Nature. (1991);353:864–868. [PubMed: 1719426]
  61. Warren S T. Polyalanine expansion in synpolydactyly might result from unequal crossing-over of HOXD13. Science. (1997);275:408–409. [PubMed: 9005557]
  62. Weber J L, Wong C. Mutation of human short tandem repeats. Hum. Molec. Genet. (1993);2:1123–1128. [PubMed: 8401493]
  63. Whitfield L S, Lovell-Badge R, Goodfellow P N. Rapid sequence evolution of the mammalian sex-determining gene SRY. Nature. (1993);364:713–717. [PubMed: 8355783]
  64. Zhang H, Tombline G, Weber B L. BRCA1, BRCA2 and DNA damage response: collision or collusion? Cell. (1998);92:433–436. [PubMed: 9491884]
Copyright © 1999, Garland Science.
Bookshelf ID: NBK7566
PubReader format: click here to try


Related information

  • OMIM
    Related OMIM records
  • PMC
    PubMed Central citations
  • PubMed
    Links to pubmed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...