NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Brown TA. Genomes. 2nd edition. Oxford: Wiley-Liss; 2002.

Chapter 15How Genomes Evolve

Learning outcomes

When you have read Chapter 15, you should be able to:

  • Speculate on the events that led to evolution of the first genomes
  • Distinguish between the various ways in which genomes can obtain new genes
  • Using examples, discuss the possible impacts that duplication of whole genomes and of individual genes or groups of genes has had on genome evolution
  • Explain how new genes can arise by domain duplication and domain shuffling
  • Assess the likely impact of lateral gene transfer on genome evolution in bacteria and in eukaryotes
  • Outline how transposable elements may have influenced genome evolution
  • Define and evaluate the ‘introns early’ and ‘introns late’ hypotheses
  • List the differences between the human and chimpanzee genomes and discuss how such similar genomes can give rise to such different biological attributes

Mutation and recombination provide the genome with the means to evolve, but we learn very little about the evolutionary histories of genomes simply by studying these events in living cells. Instead we must combine our understanding of mutation and recombination with comparisons between the genomes of different organisms in order to infer the patterns of genome evolution that have occurred. Clearly, this approach is imprecise and uncertain but, as we will see, it is based on a surprisingly large amount of hard data and we can be reasonably confident that, at least in outline, the picture that emerges is not too far from the truth.

In this chapter we will explore the evolution of genomes from the very origins of biochemical systems through to the present day. We will look at ideas regarding the RNA world, prior to the appearance of the first DNA molecules, and then examine how DNA genomes have gradually become more complex. Finally, in Section 15.4 we will compare the human genome with the genomes of other primates in order to identify the evolutionary changes that have occurred during the last five million years and which must, somehow, make us what we are.

15.1. Genomes: the First 10 Billion Years

Cosmologists believe that the universe began some 14 billion years ago with the gigantic ‘primordial fireball’ called the Big Bang. Mathematical models suggest that after about 4 billion years galaxies began to fragment from the clouds of gas emitted by the Big Bang, and that within our own galaxy the solar nebula condensed to form the Sun and its planets about 4.6 billion years ago (Figure 15.1). The early Earth was covered with water and it was in this vast planetary ocean that the first biochemical systems appeared, cellular life being well established by the time land masses began to appear, some 3.5 billion years ago. But cellular life was a relatively late stage in biochemical evolution, being preceded by self-replicating polynucleotides that were the progenitors of the first genomes. We must begin our study of genome evolution with these precellular systems.

Figure 15.1. The origins of the universe, galaxies, solar system and cellular life.

Figure 15.1

The origins of the universe, galaxies, solar system and cellular life.

15.1.1. The origins of genomes

The first oceans are thought to have had a similar salt composition to those of today but the Earth's atmosphere, and hence the dissolved gases in the oceans, was very different. The oxygen content of the atmosphere remained very low until photosynthesis evolved, and to begin with the most abundant gases were probably methane and ammonia. Experiments attempting to recreate the conditions in the ancient atmosphere have shown that electrical discharges in a methane-ammonia mixture result in chemical synthesis of a range of amino acids, including alanine, glycine, valine and several of the others found in proteins (Miller, 1953). Hydrogen cyanide and formaldehyde are also formed, these participating in additional reactions to give other amino acids, as well as purines, pyrimidines and, in less abundance, sugars. At least some of the building blocks of biomolecules could therefore have accumulated in the ancient chemosphere.

The first biochemical systems were centered on RNA

Polymerization of the building blocks into biomolecules might have occurred in the oceans or could have been promoted by the repeated condensation and drying of droplets of water in clouds (Woese, 1979). Alternatively, polymerization might have taken place on solid surfaces, perhaps making use of monomers immobilized on clay particles (Wächtershäuser, 1988), or in hydrothermal vents (Wächtershäuser, 1992). The precise mechanism need not concern us; what is important is that it is possible to envisage purely geochemical processes that could lead to synthesis of polymeric biomolecules similar to the ones found in living systems. It is the next steps that we must worry about. We have to go from a random collection of biomolecules to an ordered assemblage that displays at least some of the biochemical properties that we associate with life. These steps have never been reproduced experimentally and our ideas are therefore based mainly on speculation tempered by a certain amount of computer simulation. One problem is that the speculations are unconstrained because the global ocean could have contained as many as 1010 biomolecules per liter and we can allow a billion years for the necessary events to take place. This means that even the most improbable scenarios cannot be dismissed out of hand and a way through the resulting maze has been difficult to find.

Progress was initially stalled by the apparent requirement that polynucleotides and polypeptides must work in harness in order to produce a self-reproducing biochemical system. This is because proteins are required to catalyze biochemical reactions but cannot carry out their own self-replication. Polynucleotides can specify the synthesis of proteins and self-replicate, but it was thought that they could do neither without the aid of proteins. It appeared that the biochemical system would have to spring fully formed from the random collection of biomolecules because any intermediate stage could not be perpetuated. The major breakthrough came in the mid-1980s when it was discovered that RNA can have catalytic activity. Those ribozymes that are known today carry out three types of biochemical reaction:

In the test tube, synthetic RNA molecules have been shown to carry out other biologically relevant reactions such as synthesis of ribonucleotides (Unrau and Bartel, 1998), synthesis and copying of RNA molecules (Ekland and Bartel, 1996; Johnston et al., 2001) and transfer of an RNA-bound amino acid to a second amino acid forming a dipeptide, in a manner analogous to the role of tRNA in protein synthesis (Section 11.1; Lohse and Szostak, 1996). The discovery of these catalytic properties solved the polynucleotide-polypeptide dilemma by showing that the first biochemical systems could have been centered entirely on RNA (Bartel and Unrau, 1999).

Ideas about the RNA world have taken shape in recent years (Robertson and Ellington, 1998). We now envisage that RNA molecules initially replicated in a slow and haphazard fashion simply by acting as templates for binding of complementary nucleotides which polymerized spontaneously (Figure 15.2). This process would have been very inaccurate so a variety of RNA sequences would have been generated, eventually leading to one or more with nascent ribozyme properties that were able to direct their own, more accurate self-replication. It is possible that a form of natural selection operated so that the most efficient replicating systems began to predominate, as has been shown to occur in experimental systems. A greater accuracy in replication would have enabled RNAs to increase in length without losing their sequence specificity, providing the potential for more sophisticated catalytic properties, possibly culminating in structures as complex as present-day Group I introns (see Figure 10.26) and ribosomal RNAs (see Figure 11.11).

Figure 15.2. Copying of RNA molecules in the early RNA world.

Figure 15.2

Copying of RNA molecules in the early RNA world. Before the evolution of RNA polymerases, ribonucleotides that became associated with an RNA template would have had to polymerize spontaneously. This process would have been inaccurate and many RNA sequences (more...)

To call these RNAs ‘genomes’ is a little fanciful, but the term protogenome has attractions as a descriptor for molecules that are self-replicating and able to direct simple biochemical reactions. These reactions might have included energy metabolism, based, as today, on the release of free energy by hydrolysis of the phosphate-phosphate bonds in the ribonucleotides ATP and GTP, and the reactions might have become compartmentalized within lipid membranes, forming the first cell-like structures. There are difficulties in envisaging how long-chain unbranched lipids could form by chemical or ribozyme-catalyzed reactions, but once present in sufficient quantities they would have assembled spontaneously into membranes, possibly encapsulating one or more protogenomes and providing the RNAs with an enclosed environment in which more controlled biochemical reactions could be carried out.

The first DNA genomes

How did the RNA world develop into the DNA world? The first major change was probably the development of protein enzymes, which supplemented, and eventually replaced, most of the catalytic activities of ribozymes (Freeland et al., 1999). There are several unanswered questions relating to this stage of biochemical evolution, including the reason why the transition from RNA to protein occurred in the first place. Originally, it was assumed that the 20 amino acids in polypeptides provided proteins with greater chemical variability than the four ribonucleotides in RNA, enabling protein enzymes to catalyze a broader range of biochemical reactions, but this explanation has become less attractive as more and more ribozyme-catalyzed reactions have been demonstrated in the test tube. A more recent suggestion is that protein catalysis is more efficient because of the inherent flexibility of folded polypeptides compared with the greater rigidity of base-paired RNAs (Csermely, 1997). Alternatively, enclosure of RNA protogenomes within membrane vesicles could have prompted the evolution of the first proteins, because RNA molecules are hydrophilic and must be given a hydrophobic coat, for instance by attachment to peptide molecules, before being able to pass through or become integrated into a membrane (Walter et al., 2000).

The transition to protein catalysis demanded a radical shift in the function of the RNA protogenomes. Rather than being directly responsible for the biochemical reactions occurring in the early cell-like structures, the protogenomes became coding molecules whose main function was to specify the construction of the catalytic proteins. Whether the ribozymes themselves became coding molecules, or coding molecules were synthesized by the ribozymes is not known, although the most persuasive theories about the origins of translation and the genetic code suggest that the latter alternative is more likely to be correct (Figure 15.3; Szathmáry, 1993). Whatever the mechanism, the result was the paradoxical situation whereby the RNA protogenomes had abandoned their roles as enzymes, which they were good at, and taken on a coding function for which they were less well suited because of the relative instability of the RNA phosphodiester bond, resulting from the indirect effect of the 2′-OH group (Section 1.1.2). A transfer of the coding function to the more stable DNA seems almost inevitable and would not have been difficult to achieve, reduction of ribonucleotides giving deoxyribonucleotides which could then be polymerized into copies of the RNA protogenomes by a reverse-transcriptase-catalyzed reaction (Figure 15.4). The replacement of uracil with its methylated derivative thymine probably conferred even more stability on the DNA polynucleotide, and the adoption of double-stranded DNA as the coding molecule was almost certainly prompted by the possibility of repairing DNA damage by copying the partner strand (Sections 14.2.2 and 14.2.3).

Figure 15.3. Two scenarios for the evolution of the first coding RNA.

Figure 15.3

Two scenarios for the evolution of the first coding RNA. A ribozyme could have evolved to have a dual catalytic and coding function (A), or a ribozyme could have synthesized a coding molecule (B). In both examples, the amino acids are shown attaching (more...)

Figure 15.4. Conversion of a coding RNA molecule into the progenitor of the first DNA genome.

Figure 15.4

Conversion of a coding RNA molecule into the progenitor of the first DNA genome.

According to this scenario, the first DNA genomes comprised many separate molecules, each specifying a single protein and each therefore equivalent to a single gene. The linking together of these genes into the first chromosomes, which could have occurred either before or after the transition to DNA, would have improved the efficiency of gene distribution during cell division, as it is easier to organize the equal distribution of a few large chromosomes than many separate genes. As with most stages in early genome evolution, several different mechanisms by which genes might have become linked have been proposed (Szathmáry and Maynard Smith, 1993).

How unique is life?

If the experimental simulations and computer models are correct then it is likely that the initial stages in biochemical evolution occurred many times in parallel in the oceans or atmosphere of the early Earth. It is therefore quite possible that ‘life’ arose on more than one occasion, even though all present-day organisms appear to derive from a single origin. This single origin is indicated by the remarkable similarity between the basic molecular biological and biochemical mechanisms in bacterial, archaeal and eukaryotic cells. To take just one example, there is no obvious biological or chemical reason why any particular triplet of nucleotides should code for any particular amino acid, but the genetic code, although not universal, is virtually the same in all organisms that have been studied. If these organisms derived from more than one origin then we would anticipate two or more very different codes.

If multiple origins are possible, but modern life is derived from just one, then at what stage did this particular biochemical system begin to predominate? The question cannot be answered precisely, but the most likely scenario is that the predominant system was the first to develop the means to synthesize protein enzymes and therefore probably also the first to adopt a DNA genome. The greater catalytic potential and more accurate replication conferred by protein enzymes and DNA genomes would have given these cells a significant advantage compared with those still containing RNA protogenomes. The DNA-RNA-protein cells would have multiplied more rapidly, enabling them to out-compete the RNA cells for nutrients which, before long, would have included the RNA cells themselves.

Are life forms based on informational molecules other than DNA and RNA possible? Orgel (2000) has reviewed the possibility that RNA was preceded by some other informational molecule at the very earliest period of biochemical evolution and concluded that a pyranosyl version of RNA, in which the sugar takes on a slightly different structure, might be a better choice than normal RNA for an early protogenome because the base-paired molecules that it forms are more stable (Beier et al., 1999; Eschenmoser, 1999). The same is true of peptide nucleic acid (PNA), a polynucleotide analog in which the sugar-phosphate backbone is replaced by amide bonds (Figure 15.5). PNAs have been synthesized in the test tube and have been shown to form base pairs with normal polynucleotides. However, there are no indications that either pyranosyl RNA or PNA were more likely than RNA to have evolved in the prebiotic soup.

Figure 15.5. A short stretch of peptide nucleic acid.

Figure 15.5

A short stretch of peptide nucleic acid. A peptide nucleic acid has an amide backbone instead of the sugar-phosphate structure found in a standard nucleic acid.

15.2. Acquisition of New Genes

Although the very old fossil record is difficult to interpret, there is reasonably convincing evidence that by 3.5 billion years ago biochemical systems had evolved into cells similar in appearance to modern bacteria. We cannot tell from the fossils what kinds of genomes these first real cells had, but from the preceding section we can infer that they were made of double-stranded DNA and consisted of a small number of chromosomes, possibly just one, each containing many linked genes.

If we follow the fossil record forwards in time we see the first evidence for eukaryotic cells - structures resembling single-celled algae - about 1.4 billion years ago (Figure 15.6), and the first multicellular algae by 0.9 billion years ago. Multicellular animals appeared around 640 million years ago, although there are enigmatic burrows suggesting that animals lived earlier than this. The Cambrian Revolution, when invertebrate life proliferated into many novel forms, occurred 530 million years ago and ended with the disappearance of many of the novel forms in a mass extinction 500 million years ago. Since then, evolution has continued apace and with increasing diversification: the first terrestrial insects, animals and plants were established by 350 million years ago, the dinosaurs had been and gone by the end of the Cretaceous, 65 million years ago, and the first hominoids appeared a mere 4.5 million years ago.

Figure 15.6. The evolution of life.

Figure 15.6

The evolution of life.

Morphological evolution was accompanied by genome evolution. It is dangerous to equate evolution with ‘progress’ but it is undeniable that as we move up the evolutionary tree we see increasingly complex genomes. One indication of this complexity is gene number, which varies from less than 1000 in some bacteria to 30 000–40 000 in vertebrates such as humans. However, this increase in gene number has not occurred in a gradual fashion: instead there seem to have been two sudden bursts when gene numbers increased dramatically (Bird, 1995). The first of these expansions occurred when eukaryotes appeared about 1.4 billion years ago, and involved an increase from the 5000 or fewer genes typical of prokaryotes to the 10 000 or more seen in most eukaryotes. The second expansion is associated with the first vertebrates, which became established soon after the end of the Cambrian, with each protovertebrate probably having at least 30 000 genes, this being the minimum number for any modern vertebrate, including the most ‘primitive’ types.

There are two ways in which new genes could be acquired by a genome:

Both events have been important in genome evolution, as we will see in the next two sections.

15.2.1. Acquisition of new genes by gene duplication

The duplication of existing genes is almost certainly the most important process for the generation of new genes during genome evolution. There are several ways in which it could occur:

  • By duplication of the entire genome;
  • By duplication of a single chromosome or part of a chromosome;
  • By duplication of a single gene or group of genes.

The second of these possibilities can probably be discounted as a major cause of gene number expansions based on our knowledge of the effects of chromosome duplications in modern organisms. Duplication of individual human chromosomes, resulting in a cell that contains three copies of one chromosome and two copies of all the others (the condition called trisomy), is either lethal or results in a genetic disease such as Down syndrome, and similar effects have been observed in artificially generated trisomic mutants of Drosophila. Probably, the resulting increase in copy numbers for some genes leads to an imbalance of the gene products and disruption of the cellular biochemistry (Ohno, 1970). The other two ways of generating new genes - whole-genome duplication and duplication of a single or small number of genes - have probably been much more important.

Whole-genome duplications can result in sudden expansions in gene number

The most rapid means of increasing gene number is by duplicating the entire genome. This can occur if an error during meiosis leads to the production of gametes that are diploid rather than haploid (Figure 15.7). If two diploid gametes fuse then the result will be a type of autopolyploid, in this case a tetraploid cell whose nucleus contains four copies of each chromosome.

Figure 15.7. The basis of autopolyploidization.

Figure 15.7

The basis of autopolyploidization. The normal events occurring during meiosis are shown, in abbreviated form, on the left (compare with Figure 5.15). On the right, an aberration has occurred between prophase I and prophase II and the pairs of homologous (more...)

Autopolyploidy, as with other types of polyploidy (see page 475), is not uncommon among plants. Autopolyploids are often viable because each chromosome still has a homologous partner and so can form a bivalent during meiosis. This allows an autopolyploid to reproduce successfully, but generally prevents interbreeding with the original organism from which it was derived. This is because a cross between, for example, a tetraploid and diploid would give a triploid offspring which would not itself be able to reproduce because one full set of its chromosomes would lack homologous partners (Figure 15.8). Autopolyploidy is therefore a mechanism by which speciation can occur, a pair of species usually being defined as two organisms that are unable to interbreed. The generation of new plant species by autopolyploidy has in fact been observed, notably by Hugo de Vries, one of the rediscoverers of Mendel's experiments. During his work with evening primrose, Oenothera lamarckiana, de Vries isolated a tetraploid version of this normally diploid plant, which he named Oenothera gigas. Autopolyploidy among animals is less common, especially in those with two distinct sexes, possibly because of problems that arise if a nucleus possesses more than one pair of sex chromosomes.

Figure 15.8. Autopolyploids cannot interbreed successfully with their parents.

Figure 15.8

Autopolyploids cannot interbreed successfully with their parents. Fusion of the diploid gamete produced by the aberrant meiosis shown in Figure 15.7 with a haploid gamete produced by the normal meiosis leads to a triploid nucleus, one that has three copies (more...)

Autopolyploidy does not lead directly to gene expansion because the initial product is an organism that simply has extra copies of every gene, rather than any new genes. It does, however, provide the potential for gene expansion because the extra genes are not essential to the functioning of the cell and so can undergo mutational change without harming the viability of the organism. With many genes, the resulting changes in nucleotide sequence will be deleterious and the end result will be an inactive pseudogene, but occasionally the mutations will lead to a new gene function that is useful to the cell. This aspect of genome evolution is more clearly illustrated by considering duplications of single genes rather than of entire genomes, so we will postpone a full discussion of it until the next section.

Are there any indications of genome duplication in the evolutionary histories of present-day genomes? From what we understand about the way in which genomes change over time, we might anticipate that evidence for whole-genome duplication would be quite difficult to obtain. Many of the extra gene copies resulting from genome duplication would be expected to decay into pseudogenes and no longer be visible in the DNA sequence. Those genes that are retained, because their duplicated function is useful to the organism or because they have evolved new functions, should be identifiable, but it would be impossible to distinguish if they have arisen by genome duplication or simply by duplication of individual genes. For a genome duplication to be signaled it would be necessary to find duplicated sets of genes, with the same order of genes in both sets. To what extent these duplicated sets are still visible in the genome will depend on how frequently past recombination events have moved genes to new positions. This type of analysis has been applied to the Saccharomyces cerevisiae DNA sequence, leading to the suggestion that this genome is the product of a duplication that took place approximately 100 million years ago (Wolfe and Shields, 1997; Research Briefing 15.1), but this hypothesis is still controversial (Piskur, 2001). Comparisons between the Arabidopsis thaliana genome sequence and segments of other plant genomes suggest that the ancestor of the A. thaliana genome underwent four rounds of genome duplication between 100 and 200 million years ago (Vision et al., 2000; Bancroft, 2001). The increased number of Hox gene clusters present in some types of fish (see page 472) has been used as an argument for a duplication event in the genomic lineage leading to these organisms (Taylor et al., 2001).

Box Icon

Box 15.1

Segmental duplications in the yeast and human genomes. Examination of the yeast and human genomes reveals evidence of past duplication events. Duplication of individual genes has been recognized for some time as having played an important role in genome (more...)

Duplications of individual genes and groups of genes have occurred frequently in the past

If genome duplication has not been a common evolutionary event, then increases in gene number must have occurred primarily by duplications of individual genes and small groups of genes. This hypothesis is supported by DNA sequencing, which has revealed that multigene families are common components of all genomes (Section 2.2.1). By comparing the sequences of individual members of a family (using the techniques described in Chapter 16) it is usually possible to trace the individual gene duplications involved in evolution of the family from a single progenitor gene that existed in an ancestral genome (Figure 15.9; Henikoff et al., 1997). There are several mechanisms by which these gene duplications could have occurred:

  • Unequal crossing-over is a recombination event initiated by similar nucleotide sequences that are not at identical places in a pair of homologous chromosomes. As shown in Figure 15.10A, the result of unequal crossing-over can be duplication of a segment of DNA in one of the recombination products.
  • Unequal sister chromatid exchange occurs by the same mechanism as unequal crossing-over, but involves a pair of chromatids from a single chromosome (see Figure 15.10B).
  • DNA amplification is sometimes used in this context to describe gene duplication in bacteria and other haploid organisms (Romero and Palacios, 1997), in which duplications can arise by unequal recombination between the two daughter DNA molecules in a replication bubble (Figure 15.10C).
  • Replication slippage (see Figure 14.5) could result in gene duplication if the genes are relatively short, although this process is more commonly associated with the duplication of very short sequences such as the repeat units in microsatellites.

Figure 15.9. Gene duplications during the evolution of the human globin gene families.

Figure 15.9

Gene duplications during the evolution of the human globin gene families. Comparisons of their nucleotide sequences enable the evolutionary relationships between the globin genes to be deduced, using the molecular phylogenetics techniques described in (more...)

Figure 15.10. Models for gene duplication by (A) unequal crossing-over between homologous chromosomes, (B) unequal sister chromatid exchange, and (C) during replication of a bacterial genome.

Figure 15.10

Models for gene duplication by (A) unequal crossing-over between homologous chromosomes, (B) unequal sister chromatid exchange, and (C) during replication of a bacterial genome. In each case, recombination occurs between two different copies of a short (more...)

The initial result of gene duplication is two identical genes. As mentioned above with regard to genome duplication, selective constraints will ensure that one of these genes retains its original nucleotide sequence, or something very similar to it, so that it can continue to provide the protein function that was originally supplied by the single gene copy before the duplication took place. The second copy is probably not subject to the same selective pressures and so can accumulate mutations at random. Evidence shows that the majority of new genes that arise by duplication acquire deleterious mutations that inactivate them so that they become pseudogenes (Wagner, 2001). From the sequences of the pseudogenes in the α- and β-globin gene families (Figure 2.14), it appears that the commonest inactivating mutations are frameshifts and nonsense mutations that occur within the coding region of the gene, with mutations of the initiation codon and TATA box being less frequent.

Occasionally, the mutations that accumulate within a gene copy do not lead to inactivation of the gene, but instead result in a new gene function that is useful to the organism. We have already seen that gene duplication in the globin gene families led to the evolution of new globin proteins that are used by the organism at different stages in its development (see Figure 2.14). We also noted (page 45) that all the globin genes, both the α- and β-types, are related and hence form a gene superfamily that originated with a single ancestral globin gene that split to give the proto-α and proto-β globins about 500 million years ago (see Figure 15.9). Further back, about 800 million years ago, this ancestral globin gene itself arose by gene duplication, its sister duplicate evolving to give the modern gene for myoglobin, a muscle protein whose main function, like that of the globins, is the storage of oxygen (Doolittle, 1987). We observe similar patterns of evolution when we compare the sequences of other genes. The trypsin and chymotrypsin genes, for example, are related by a common ancestor approximately 1500 million years ago (Barker and Dayhoff, 1980). Both now code for proteases involved in protein breakdown in the vertebrate digestive tract, trypsin cutting other proteins at arginine and lysine amino acids and chymotrypsin cutting at phenylalanines, tryptophans and tyrosines. Genome evolution has therefore produced two complementary protein functions where originally there was just one.

The most striking example of gene evolution by duplication, whether by duplication of a small group of genes or by whole-genome duplication, is provided by the homeotic selector genes, the key developmental genes responsible for specification of the body plans of animals. As described in Section 12.3.3, Drosophila has a single cluster of homeotic selector genes, called HOM-C, which consists of eight genes each containing a homeodomain sequence coding for a DNA-binding motif in the protein product (see Figure 12.30). These eight genes, as well as other homeodomain genes in Drosophila, are believed to have arisen by a series of gene duplications that began with an ancestral gene that existed about 1000 million years ago. The functions of the modern genes, each specifying the identity of a different segment of the fruit fly, gives us a tantalizing glimpse of how gene duplication and sequence divergence could, in this case, have been the underlying processes responsible for increasing the morphological complexity of the series of organisms in the Drosophila evolutionary tree.

Vertebrates have four Hox gene clusters (see Figure 12.30), each a recognizable copy of the Drosophila cluster, with sequence similarities between genes in equivalent positions. Not all of the vertebrate Hox genes have been ascribed functions, but we believe that the additional versions possessed by vertebrates relate to the added complexity of the vertebrate body plan. Two observations support this conclusion. The amphioxus, an invertebrate that displays some primitive vertebrate features, has two Hox clusters (Brooke et al., 1998), which is what we might expect for a primitive ‘protovertebrate’. Ray-finned fishes, probably the most diverse group of vertebrates with a vast range of different variations of the basic body plan, have seven Hox clusters (Amores et al., 1998).

Gene duplication is not always followed by sequence divergence and the evolution of a family of genes with different functions. Some multigene families are made up of genes with identical or near-identical sequences. The prime examples are the rRNA genes, whose copy numbers range from two in Mycoplasma genitalium to 500+ in Xenopus laevis (Section 2.2.1), with all of the copies having virtually the same sequence. These multiple copies of identical genes presumably reflect the need for rapid synthesis of the gene product at certain stages of the cell cycle. With these gene families there must be a mechanism that prevents the individual copies from accumulating mutations and hence diverging away from the functional sequence. This is called concerted evolution. If one copy of the family acquires an advantageous mutation then it is possible for that mutation to spread throughout the family until all members possess it. The most likely way in which this can be achieved is by gene conversion which, as described in Section 14.3.1, can result in the sequence of one copy of a gene being replaced with all or part of the sequence of a second copy. Multiple gene conversion events could therefore maintain identity among the sequences of the individual members of a multigene family.

Box Icon

Box 15.1

Gene duplication and genetic redundancy. The text adopts the conventional scenario which states that after a duplication, one of the two gene copies can accumulate mutations which either result in inactivation of that gene copy or lead to a new gene function. (more...)

Genome evolution also involves rearrangement of existing genes

As well as the generation of new genes by duplication followed by mutation, novel protein functions can also be produced by rearranging existing genes. This is possible because most proteins are made up of structural domains (Section 3.3.3), each comprising a segment of the polypeptide chain and hence encoded by a contiguous series of nucleotides (Figure 15.11). There are two ways in which rearrangement of domain-encoding gene segments can result in novel protein functions.

Figure 15.11. Structural domains are individual units in a polypeptide chain coded by a contiguous series of nucleotides.

Figure 15.11

Structural domains are individual units in a polypeptide chain coded by a contiguous series of nucleotides. In this simplified example, each secondary structure in the polypeptide is looked upon as an individual structural domain. In reality, most structural (more...)

  • Domain duplication occurs when the gene segment coding for a structural domain is duplicated by unequal crossing-over, replication slippage or one of the other methods that we have considered for duplication of DNA sequences (Figure 15.12A). Duplication results in the structural domain being repeated in the protein, which might itself be advantageous, for example by making the protein product more stable. The duplicated domain might also change over time as its coding sequence becomes mutated, leading to a modified structure that might provide the protein with a new activity. Note that domain duplication causes the gene to become longer. Gene elongation appears to be a general consequence of genome evolution, the genes of higher eukaryotes being longer, on average, than those of lower organisms.
  • Domain shuffling occurs when segments coding for structural domains from completely different genes are joined together to form a new coding sequence that specifies a hybrid or mosaic protein, one that would have a novel combination of structural features and might provide the cell with an entirely new biochemical function (Figure 15.12B).
Figure 15.12. Creating new genes by (A) domain duplication and (B) domain shuffling.

Figure 15.12

Creating new genes by (A) domain duplication and (B) domain shuffling.

Implicit in these models of domain duplication and shuffling is the need for the relevant gene segments to be separated so that they can themselves be rearranged and shuffled. This requirement has led to the attractive suggestion that exons might code for structural domains. With some proteins, duplication or shuffling of exons does seem to have resulted in the structures seen today. An example is provided by the α2 Type I collagen gene of vertebrates, which codes for one of the three polypeptide chains of collagen. Each of the three collagen polypeptides has a highly repetitive sequence made up of repeats of the tripeptide glycine-X-Y, where X is usually proline and Y is usually hydroxyproline (Figure 15.13). The α2 Type I gene, which codes for 338 of these repeats, is split into 52 exons, 42 of which cover the part of the gene coding for the glycine-X-Y repeats. Within this region, each exon encodes a set of complete tripeptide repeats. The number of repeats per exon varies but is 5 (5 exons), 6 (23 exons), 11 (5 exons), 12 (8 exons) or 18 (1 exon). Clearly this gene could have evolved by duplication of exons leading to repetition of the structural domains.

Figure 15.13. The α2 Type I collagen polypeptide has a repetitive sequence described as Gly-X-Y.

Figure 15.13

The α2 Type I collagen polypeptide has a repetitive sequence described as Gly-X-Y. Every third amino acid is glycine, X is often proline and Y is often hydroxyproline (Hyp). See Table 3.1 for other amino acid abbreviations. Hydroxyproline is a (more...)

Domain shuffling is illustrated by tissue plasminogen activator (TPA), a protein found in the blood of vertebrates and which is involved in the blood clotting response. The TPA gene has four exons, each coding for a different structural domain (Figure 15.14). The upstream exon codes for a ‘finger’ module that enables the TPA protein to bind to fibrin, a fibrous protein found in blood clots and which activates TPA. This exon appears to be derived from a second fibrin-binding protein, fibronectin, and is absent from the gene for a related protein, urokinase, which is not activated by fibrin. The second TPA exon specifies a growth-factor domain which has apparently been obtained from the gene for epidermal growth factor and which may enable TPA to stimulate cell proliferation. The last two exons code for ‘kringle’ structures which TPA uses to bind to fibrin clots; these kringle exons come from the plasminogen gene (Li and Graur, 1991).

Figure 15.14. The modular structure of the tissue plasminogen activator protein.

Figure 15.14

The modular structure of the tissue plasminogen activator protein. See the text for details.

Type I collagen and TPA provide elegant examples of gene evolution but, unfortunately, the clear links that they display between structural domains and exons are exceptional and are rarely seen with other genes. Many other genes appear to have evolved by duplication and shuffling of segments, but in these the structural domains are coded by segments of genes that do not coincide with individual exons or even groups of exons. Domain duplication and shuffling still occur, but presumably in a less precise manner and with many of the rearranged genes having no useful function. Despite being haphazard, the process clearly works, as indicated by, among other examples, the number of proteins that share the same DNA-binding motifs (Section 9.1.4). Several of these motifs probably evolved de novo on more than one occasion, but it is clear that in many cases the nucleotide sequence coding for the motif has been transferred to a variety of different genes.

15.2.2. Acquisition of new genes from other species

The second possible way in which a genome can acquire new genes is to obtain them from another species. Comparisons of bacterial and archaeal genome sequences suggest that lateral gene transfer has been a major event in the evolution of prokaryotic genomes (Section 2.3.2). The genomes of most bacteria and archaea contain at least a few hundred kb of DNA, representing tens of genes, that appears to have been acquired from a second prokaryote.

There are several mechanisms by which genes can be transferred between prokaryotes but it is difficult to be sure how important these various processes have been in shaping the genomes of these organisms. Conjugation (Section 5.2.4), for example, enables plasmids to move between bacteria and frequently results in the acquisition of new gene functions by the recipients. On a day-to-day basis, plasmid transfer is important because it is the means by which genes for resistance to antibiotics such as chloramphenicol, kanamycin and streptomycin spread through bacterial populations and across species barriers, but its evolutionary relevance is questionable. It is true that the genes transferred by conjugation can become integrated into the recipient bacterium's genome, but usually the genes are carried by composite transposons (see Figure 2.29B), which means that the integration is reversible and so might not result in a permanent change to the genome. A second process for DNA transfer between prokaryotes, transformation (Section 5.2.4), is more likely to have had an influence on genome evolution. Only a few bacteria, notably members of the Bacillus, Pseudomonas and Streptococcus genera, have efficient mechanisms for the uptake of DNA from the surrounding environment, but efficiency of DNA uptake is probably not relevant when we are dealing with an evolutionary time-scale. More important is the fact that gene flow by transformation can occur between any pair of prokaryotes, not just closely related ones (as is the case with conjugation), and so could account for the transfers that appear to have occurred between bacterial and archaeal genomes (Section 2.3.2).

In plants, new genes can be acquired by polyploidization. We have already seen how autopolyploidization can result in genome duplication in plants (see Figure 15.7). Allopolyploidy, which results from interbreeding between two different species, is also common and, like autopolyploidy, can result in a viable hybrid. Usually, the two species that form the allopolyploid are closely related and have many genes in common, but each parent will possess a few novel genes or at least distinctive alleles of shared genes. For example, the bread wheat, Triticum aestivum, is a hexaploid that arose by allopolyploidization between cultivated emmer wheat, T. turgidum, which is a tetraploid, and a diploid wild grass, Aegilops squarrosa. The wild-grass nucleus contained novel alleles for the high-molecular-weight glutenin genes which, when combined with the glutenin alleles already present in emmer wheat, resulted in the superior properties for breadmaking displayed by the hexaploid wheats. Allopolyploidization can therefore be looked upon as a combination of genome duplication and interspecies gene transfer.

Among animals, the species barriers are less easy to cross and it is difficult to find clear evidence for lateral gene transfer of any kind. Several eukaryotic genes have features associated with archaeal or bacterial sequences, but rather than being the result of lateral gene transfer, these similarities are thought to result from conservation during millions of years of parallel evolution. Most proposals for gene transfer between animal species center on retroviruses and transposable elements. Transfer of retroviruses between animal species is well documented, as is their ability to carry animal genes between individuals of the same species, suggesting that they might be possible mediators of lateral gene transfer. The same could be true of transposable elements such as P elements, which are known to spread from one Drosophila species to another, and mariner, which has also been shown to transfer between Drosophila species and which may have crossed from other species into humans (Robertson et al., 1996; Hartl et al., 1997).

15.3. Non-coding DNA and Genome Evolution

So far we have concentrated our attention on the evolution of the coding component of the genome. As coding DNA makes up only 1.5% of the human genome (see Box 1.4) our view of genome evolution would be very incomplete if we did not devote some time to considering non-coding DNA. The problem is that in many respects there is little that can be said about the evolution of non-coding DNA. We envisage that duplications and other rearrangements have occurred through recombination and replication slippage, and that sequences have diverged through accumulation of mutations unfettered by the restraining selective forces acting on functional regions of the genome. We recognize that some parts of the non-coding DNA, for example the regulatory regions upstream of genes, have important functions, but as far as most of the non-coding DNA is concerned, all we can say is that it evolves in an apparently random fashion.

This randomness does not apply to all components of the non-coding DNA. In particular, transposable elements and introns have interesting evolutionary histories and are of general importance in genome evolution, as described in the following two sections.

15.3.1. Transposable elements and genome evolution

Transposable elements have a number of effects on evolution of the genome as a whole. The most significant of these is the ability of transposons to initiate recombination events that lead to genome rearrangements. This has nothing to do with the transposable activity of these elements, it simply relates to the fact that different copies of the same element have similar sequences and can therefore initiate recombination between two parts of the same chromosome or between different chromosomes (Figure 15.15). In many cases, the resulting rearrangement will be harmful because important genes will be deleted, but some instances where the result has been beneficial have been documented. Recombination between a pair of LINE-1 elements (Section 2.4.2) approximately 35 million years ago is thought to have caused the β-globin gene duplication that resulted in the Gγ and Aγ members of this gene family (see Figure 15.9; Maeda and Smithies, 1986).

Figure 15.15. Transposons can initiate recombination events between chromosomes or between different sites on the same chromosome.

Figure 15.15

Transposons can initiate recombination events between chromosomes or between different sites on the same chromosome.

Movement of transposons from one site to another can also have an impact on genome evolution. The transposition of a LINE-1 element can occasionally result in a short piece of the adjacent DNA being transferred along with the transposon, a process called 3transduction, the transferred segment being located at the 3′ end of the element. LINE-1 elements are sometimes found in introns so 3′ transduction could conceivably move downstream exons to new sites in the genome (Kazazian, 2000). Transposition has also been associated with altered patterns of gene expression. For example, the efficiency with which DNA-binding proteins that are attached to upstream regulatory sequences can activate transcription of a gene might be affected if a transposon moves into a new site immediately upstream of the gene (Figure 15.16). Transcription of the gene might also be influenced by the presence of promoters and/or enhancers within the transposon, so the gene becomes subject to an entirely new regulatory regime (McDonald, 1995). An interesting example of transposon-directed gene expression occurs with the mouse gene Slp, which codes for a protein involved in the immune response, the tissue specificity of Slp being conferred by an enhancer located within an adjacent retrotransposon (Stavenhagen and Robins, 1988). There are also examples where insertion of a transposon into a gene has resulted in an altered splicing pattern (Purugganan and Wessler, 1992).

Figure 15.16. Insertion of a transposon into the region upstream of a gene could affect the ability of DNA-binding proteins to activate transcription.

Figure 15.16

Insertion of a transposon into the region upstream of a gene could affect the ability of DNA-binding proteins to activate transcription.

Box Icon

Box 15.2

The origin of a microsatellite. There is no mystery about the origins of microsatellite repeat sequences (Section 2.4.1). A dimeric microsatellite, consisting of two repeat units in tandem array, can easily arise by chance mutational events. Replication (more...)

15.3.2. The origins of introns

Ever since introns were first discovered in the 1970s their origins have been debated. There are few controversies surrounding the Group I, II and III types (see Table 10.2) as it is generally accepted that all these self-splicing introns evolved in the RNA world and have survived ever since without undergoing a great deal of change. The problems surround the origins of the GU-AG introns, the ones that are found in large numbers in eukaryotic nuclear genomes.

‘Introns early’ and ‘introns late’: two competing hypotheses

A number of proposals for the origins of GU-AG introns have been put forward but the debate is generally considered to be between two opposing hypotheses:

  • Introns early states that introns are very ancient and are gradually being lost from eukaryotic genomes.
  • Introns late states that introns evolved relatively recently and are gradually accumulating in eukaryotic genomes.

There are several different models for each hypothesis. For ‘introns early’ the most persuasive model is the one also called the ‘exon theory of genes’ (Gilbert, 1987) which holds that introns were formed when the first DNA genomes were constructed, soon after the end of the RNA world. These genomes would have contained many short genes, each derived from a single coding RNA molecule and each specifying a very small polypeptide, perhaps just a single structural domain. These polypeptides would probably have had to associate together into larger multidomain proteins in order to produce enzymes with specific and efficient catalytic mechanisms (Figure 15.17). To aid the synthesis of a multidomain enzyme it would have been beneficial for the enzyme's individual polypeptides to become linked into a single protein, such as we see today. This was achieved by splicing together the transcripts of the relevant minigenes, a process that was aided by rearranging the genome so that groups of minigenes specifying the different parts of individual multidomain proteins were positioned next to each other. In other words, the minigenes became exons and the DNA sequences between them became introns.

Figure 15.17. The ‘exon theory of genes’.

Figure 15.17

The ‘exon theory of genes’. The short genes of the first genomes probably coded for single-domain polypeptides that would have had to associate together to form a multisubunit protein to produce an effective enzyme. Later the synthesis (more...)

According to the exon theory of genes and other ‘introns early’ hypotheses, all genomes originally possessed introns. But we know that bacterial genomes do not have GU-AG introns, so if these hypotheses are correct then we must assume that for some reason introns became lost from the ancestral bacterial genome at an early stage in its evolution. This is a stumbling block because it is difficult to envisage how a large number of introns could be lost from a genome without risking the disruption of many gene functions. If an intron is removed from a gene with any imprecision then a part of the coding region will be lost or a frameshift mutation will occur, both of which would be expected to inactivate the gene. The ‘introns late’ hypothesis avoids this problem by proposing that, to begin with, no genes had introns, these structures invading the early eukaryotic nuclear genome and subsequently proliferating into the numbers seen today. The similarities between the splicing pathways for GU-AG and Group II introns (Section 10.2.3) suggest that the invaders that gave rise to GU-AG introns might well have been Group II sequences that escaped from organelle genomes (Eickbush, 2000). However, the similarity between GU-AG and Group II introns does not prove the ‘introns late’ view, because it is equally possible to devise an ‘introns early’ model, different to the exon theory of genes, in which Group II sequences gave rise to GU-AG introns, but at a very early stage in genome evolution.

The current evidence disproves neither hypothesis

One of the reasons why the debate regarding the origin of GU-AG introns has continued for over 20 years is because evidence in support of either hypothesis has been difficult to obtain and is often ambiguous. One prediction of ‘introns early’ is that there should be a close similarity between the positions of introns in homologous genes from unrelated organisms, because all these genes are descended from an ancestral intron-containing gene (Figure 15.18). Early support for ‘introns early’ came when this was shown to be the case for four introns in animal and plant genes for triosephosphate isomerase (Gilbert et al., 1986). However, when a larger number of species was examined the positions of the introns in this gene became less easy to interpret: it appeared that introns had been lost in some lineages but gained in others. This scenario fits both ‘introns early’ and ‘introns late’ as both allow for the loss, gain or repositioning of introns by recombination events occurring in individual lineages. When many genes in many organisms are examined the general picture that emerges is that intron numbers have gradually increased during the evolution of animal genomes, this being put forward as evidence for ‘introns late’ (Palmer and Logsdon, 1991), despite the fact that animal mitochondrial genomes do not contain Group II introns that could supplement the existing nuclear introns by repeated invasions. Intron numbers must therefore have increased by recombination events, which is possible with both hypotheses.

Figure 15.18. One prediction of the ‘introns early’ hypothesis is that the positions of introns in homologous genes should be similar in unrelated organisms, because all these genes are descended from an ancestral intron-containing gene.

Figure 15.18

One prediction of the ‘introns early’ hypothesis is that the positions of introns in homologous genes should be similar in unrelated organisms, because all these genes are descended from an ancestral intron-containing gene.

An alternative approach has been to try to correlate exons with protein structural domains, as the ‘introns early’ hypothesis predicts that such a link should be evident, even allowing for the fuzzying effects of evolution since the primitive minigenes were assembled into the first real genes. Again, the first evidence to be obtained supported ‘introns early’. A study of vertebrate globin proteins concluded that each of these comprises four structural domains, the first corresponding to exon 1 of the globin gene, the second and third to exon 2, and the fourth to exon 3 (Figure 15.19; Go, 1981). The prediction that there should be globin genes with another intron that splits the second and third domains was found to be correct when the leghemoglobin gene of soybean was shown to have an intron at exactly the expected position (Jensen et al., 1981). Unfortunately, as more globin genes were sequenced more introns were discovered - more than ten in all. The positions of the majority of these do not correspond to junctions between domains.

Figure 15.19. A vertebrate globin gene showing the relationship between the three exons and the four domains of the globin protein.

Figure 15.19

A vertebrate globin gene showing the relationship between the three exons and the four domains of the globin protein.

The globin genes therefore conform with the general principle that emerged from our discussion of domain shuffling (Section 15.2.1): that in most cases there are no clear links between gene exons and protein structural domains. But is our definition of ‘structural domain’ correct? A structural domain within a protein may not simply correspond with a group of secondary structures such as α-helices and β-sheets. A more subtle interpretation might be that a structural domain is a polypeptide segment whose amino acids are less than a certain distance apart in the protein's tertiary structure. It has been suggested that when this definition is adopted there is a better correlation between structural domain and exon (de Souza et al., 1996).

Box Icon

Box 15.3

The role of non-coding DNA. The presence of extensive amounts of non-coding DNA in eukaryotic genomes (see Box 1.4) is a puzzle for molecular evolutionists. Why is this apparently superfluous DNA tolerated? One possibility is that the non-coding DNA has (more...)

15.4. The Human Genome: the Last 5 Million Years

Although the evolutionary history of humans is controversial, it is generally accepted that our closest relative among the primates is the chimpanzee and that the most recent ancestor that we share with the chimps lived 4.6–5.0 million years ago (Takahata, 1995). Since the split, the human lineage has embraced two genera - Australopithecus and Homo - and a number of species, not all of which were on the direct line of descent to Homo sapiens (Figure 15.20). The result is us, a novel species in possession of what are, at least to our eyes, important biological attributes that make us very different from all other animals. So how different are we from the chimpanzees?

Figure 15.20. One possible scheme for the evolution of modern humans from australopithecine ancestors.

Figure 15.20

One possible scheme for the evolution of modern humans from australopithecine ancestors. There are many controversies in this area of research and several different hypotheses have been proposed for the evolutionary relationships between different fossils. (more...)

As far as our genomes are concerned the answer is ‘about 1.5%’, this being the extent of the nucleotide sequence dissimilarity between humans and chimpanzees (Hacia, 2001). Within the coding DNA the difference is less than 1.5%, with many genes having identical sequences in the two genomes, but even in the noncoding regions the dissimilarity is rarely more than 3%. Only a few clear differences have been discovered:

  • Humans lack a 92-bp segment of the gene for the N-glycolyl-neuraminic acid hydroxylase and so cannot synthesize the hydroxylated form of N-glycolyl-neuraminic acid, which is present on the surfaces of some chimpanzee cells (Chou et al., 1998; Muchmore et al., 1998). This may have an effect on the ability of certain pathogens to enter human cells, and could possibly influence some types of cell-cell interaction, but the difference is not thought to be particularly significant.
  • Several recent gene duplications have occurred, resulting in gene copies that can be described as human-specific or chimpanzee-specific, as they are present in only one or the other genome. However, as far as gene functions are concerned these new genes are not significant because they have not yet had time to accumulate mutations to any great extent and so, in effect, are simply second copies of the genes from which they were derived.
  • Some components of the non-coding DNA in the two genomes have diverged extensively, illustrating how quickly repetitive DNA can evolve. For example, the alphoid DNA sequences present at human centromeres (Section 2.2.1) are quite different from the equivalent sequences in chimpanzee and gorilla chromosomes (Archidiacono et al., 1995). The human genome also contains novel versions of the Alu element (Section 2.4.2; Zietkiewicz et al., 1994).
  • Human and chimpanzee genomes have undergone a few rearrangements, as revealed when the chromosome banding patterns are compared. The most dramatic difference is that human chromosome 2 is two separate chromosomes in chimpanzees (Figure 15.21), so chimpanzees, as well as other apes, have 24 pairs of chromosomes whereas humans have just 23 pairs. Four other chromosomes - human numbers 5, 6, 9 and 12 - also have visible differences to their chimpanzee counterparts, although the other 18 chromosomes appear to be very similar if not identical (Yunis and Prakash, 1982).

Figure 15.21. Human chromosome 2 is the product of a fusion between two chimpanzee chromosomes.

Figure 15.21

Human chromosome 2 is the product of a fusion between two chimpanzee chromosomes. For more details about the banding patterns of these chromosomes, from which the fusion is deduced, see Strachan and Read (1999).

These differences are interesting as far as genome evolution is concerned but none of them reveals anything about the basis of the special biological attributes possessed by humans. This question - what makes us different from chimpanzees and other apes - is perplexing molecular biologists, who have been frustrated by the absence of a sequencing project for any of the ape genomes (Gibbons, 1998). But this is only part of the problem because many of the key differences between humans and apes are likely to lie with subtle changes in the expression patterns of genes involved in developmental processes and in specification of interconnections within the nervous system. Differences in the expression patterns of genes in the brains of humans and chimps have been revealed by microarray analysis (see Technical Note 5.1; Normile, 2001), but understanding how these differences relate to brain function will not be easy. It is clear, however, that what makes us human is probably not the human genome itself, but the way in which the genome functions.

Study Aids For Chapter 15

Self study questions


Starting at 4.6 billion years ago, outline the key periods in genome evolution.


Summarize current thinking regarding the processes that led to evolution of the first genomes. Be careful to distinguish between the RNA world and the DNA world and to indicate how the transition from the former to latter is thought to have occurred.


Which periods during the last 1.5 billion years are linked to sudden increases in gene number?


Describe how the formation of an autopolyploid could result in an increase in gene number.


What indications are there that genome duplication has been important during the evolutionary histories of present-day genomes?


Using diagrams, distinguish between the four processes that could lead to gene duplication.


Explain how the globin gene superfamily illustrates the importance of gene duplication in genome evolution.


Discuss the impact of gene duplication on the evolution of the homeotic selector genes of eukaryotes.


Define the term ‘concerted evolution’ and state why this process is important in the evolution of some multigene families.


Describe, with examples, the processes of domain duplication and domain shuffling.


Discuss the evidence for lateral gene transfer between prokaryotic organisms. To what extent is lateral gene transfer likely to have contributed to genome evolution in eukaryotes?


Outline the impact that transposable elements can have on genome evolution.


Distinguish between the ‘introns early’ and ‘introns late’ hypotheses. What evidence is there to support these hypotheses?


List the ways in which the human genome differs from that of chimpanzees. What are the likely genetic explanations for the important differences between the biological attributes of humans and chimps?

Problem-based learning


How unique is life?


Are the examples of domain duplication and domain shuffling given in Section 15.2.1 special cases or are they representative of genome evolution in general?


‘Among animals, the species barriers are less easy to cross and it is difficult to find clear evidence for lateral gene transfer of any kind.’ This statement reflects current thinking but is not secure. Indeed, one of the publications describing the draft human genome sequence lists 31 human genes that might have been acquired from bacteria by lateral gene transfer (IHGSC [International Human Genome Sequencing Consortium] [2001] Initial sequencing and analysis of the human genome. Nature, 409, 860–921). Explore the controversies surrounding the influence of lateral gene transfer on the composition of the human genome.


Evaluate the ‘introns early’ and ‘introns late’ hypotheses.


Examine the differences between the human and chimpanzee genomes and provide a more detailed account of the reasons for the biological differences between humans and chimps.


  1. Amores A, Force A, Yan Y-L. et al. Zebrafish hox clusters and vertebrate genome evolution. Science. (1998);282:1711–1714. [PubMed: 9831563]
  2. Archidiacono N, Antonacci R, Marzella R, Finelli P, Lonoce A, Rocchi M. Comparative mapping of human alphoid sequences in great apes using fluorescence in situ hybridization. Genomics. (1995);25:477–484. [PubMed: 7789981]
  3. Bancroft I. Duplicate and diverge: the evolution of plant genome microstructure. Trends Genet. (2001);17:89–93. [PubMed: 11173118]
  4. Barker WC, Dayhoff MO. Evolutionary and functional relationships of homologous physiological mechanisms. BioScience. (1980);30:593–600.
  5. Bartel DP, Unrau PJ. Constructing an RNA world. Trends Cell Biol. (1999);9:M9–M13. [PubMed: 10611672]
  6. Beier M, Reck F, Wagner T, Krishnamurthy R, Eschenmoser A. Chemical etiology of nucleic acid structure: comparing pentopyranosyl-(2′→4′) oligonucleotides with RNA. Science. (1999);283:699–703. [PubMed: 9924032]
  7. Bird AP. Gene number, noise reduction and biological complexity. Trends Genet. (1995);11:94–100. [PubMed: 7732579]
  8. Brooke NM, Garcia-Fernàndez J, Holland PWH. The ParaHox gene cluster is an evolutionary sister of the Hox gene cluster. Nature. (1998);392:920–922. [PubMed: 9582071]
  9. Brookfield JFY. Genetic redundancy. Adv. Genet. (1997);36:137–155. [PubMed: 9348654]
  10. Cavalier-Smith T. Nuclear volume control by nucleoskeletal DNA, selection for cell volume and cell growth rate and the solution to the DNA C-value paradox. J. Cell Sci. (1978);34:247–278. [PubMed: 372199]
  11. Chou HH, Takematsu H, Diaz S. et al. A mutation in human CMP-sialic acid hydroxylase occurred after the Homo-Pan divergence. Proc. Natl Acad. Sci. USA. (1998);95:11751–11756. [PMC free article: PMC21712] [PubMed: 9751737]
  12. Csermely P. Proteins, RNAs and chaperones in enzyme evolution: a folding perspective. Trends Biochem. Sci. (1997);22:147–149. [PubMed: 9175467]
  13. de Souza SJ, Long M, Schoenbach L, Roy SW, Gilbert W. Intron positions correlate with module boundaries in ancient proteins. Proc. Natl Acad. Sci. U. S. A. (1996);93:14632–14636. [PMC free article: PMC26186] [PubMed: 8962105]
  14. Doolittle RF. The evolution of the vertebrate plasma proteins. Biol. Bull. (1987);172:269–283.
  15. Eickbush TH. Introns gain ground. Nature. (2000);404:940–943. [PubMed: 10801107]
  16. Ekland EH, Bartel DP. RNA-catalysed RNA polymerization using nucleoside triphosphates. Nature. (1996);382:373–376. [PubMed: 8684470]
  17. Eschenmoser A. Chemical etiology of nucleic acid structure. Science. (1999);284:2118–2124. [PubMed: 10381870]
  18. Freeland SJ, Knight RD, Landweber LF. Do proteins predate DNA? Science. (1999);286:690–692. [PubMed: 10577226]
  19. Gibbons A. Which of our genes makes us human? Science. (1998);281:1432–1434. [PubMed: 9750111]
  20. Gilbert W. The exon theory of genes. Cold Spring Harbor Symp. Quant. Biol. (1987);52:901–905. [PubMed: 2456887]
  21. Gilbert W, Marchionni M, McKnight G. On the antiquity of introns. Cell. (1986);46:151–153. [PubMed: 2424613]
  22. Go M. Correlation of DNA exonic regions with protein structural units in hemoglobin. Nature. (1981);291:90–92. [PubMed: 7231530]
  23. Hacia JG. Genome of the apes. Trends Genet. (2001);17:637–645. [PubMed: 11672864]
  24. Hartl DL, Lohe AR, Lozovskaya ER. Modern thoughts on an ancyent marinere: function, evolution, regulation. Ann. Rev. Genet. (1997);31:337–358. [PubMed: 9442899]
  25. Henikoff S, Greene EA, Pietrokovski S, Bork P, Attwood TK, Hood L. Gene families: the taxonomy of protein paralogs and chimeras. Science. (1997);278:609–614. [PubMed: 9381171]
  26. Jensen EO, Paludan K, Hyldig-Nielsen JJ, Jorgensen P, Markere KA. The structure of a chromosomal leghemoglobin gene from soybean. Nature. (1981);291:677–679.
  27. Johnston WK, Unrau PJ, Lawrence MS, Glasner ME, Bartel DP. RNA-catalyzed RNA polymerization: accurate and general RNA-templated primer extension. Science. (2001);292:1319–1325. [PubMed: 11358999]
  28. Kazazian HH. L1 retrotransposons shape the mammalian genome. Science. (2000);289:1152–1153. [PubMed: 10970230]
  29. Li W-H and Graur D (1991) Fundamentals of Molecular Evolution. Sinauer, Sunderland, MA.
  30. Lohse PA, Szostak JW. Ribozyme-catalysed amino-acid transfer reactions. Nature. (1996);381:442–444. [PubMed: 8632803]
  31. Maeda N, Smithies O. The evolution of multigene families: human haptoglobin genes. Ann. Rev. Genet. (1986);20:81–108. [PubMed: 2880559]
  32. McDonald JF. Transposable elements: possible catalysts of organismic evolution. Trends Ecol. Evol. (1995);10:123–126. [PubMed: 21236980]
  33. Messier W, Li S-H, Stewart C-B. The birth of microsatellites. Nature. (1996);381:483. [PubMed: 8632820]
  34. Miller SL. A production of amino acids under possible primitive Earth conditions. Science. (1953);117:528–529. [PubMed: 13056598]
  35. Muchmore EA, Diaz S, Varki A. A structural difference between the cell surfaces of humans and the great apes. Am. J. Phys. Anthropol. (1998);107:187–198. [PubMed: 9786333]
  36. Normile D. Gene expression differs in human and chimp brains. Science. (2001);292:44–45. [PubMed: 11294209]
  37. Nowak MA, Boerlijst MC, Cooke J, Maynard Smith J. Evolution of genetic redundancy. Nature. (1997);388:167–170. [PubMed: 9217155]
  38. Ohno S (1970) Evolution by Gene Duplication. George Allen and Unwin, London.
  39. Orgel LE. A simpler nucleic acid. Science. (2000);290:1306–1307. [PubMed: 11185405]
  40. Orgel LE, Crick FHC. Selfish DNA: the ultimate parasite. Nature. (1980);284:604–607. [PubMed: 7366731]
  41. Palmer JD, Logsdon JM. The recent origin of introns. Curr. Opin. Genet. Dev. (1991);1:470–477. [PubMed: 1822279]
  42. Piskur J. Origin of the duplicated regions in the yeast genome. Trends Genet. (2001);17:302–303. [PubMed: 11377778]
  43. Purugganan MD, Wessler S. The splicing of transposable elements and its role in intron evolution. Genetica. (1992);86:295–303. [PubMed: 1334914]
  44. Robertson HM, Zumpano KL, Lohe AR, Hartl DL. Reconstructing the ancient mariners of humans. Nature Genet. (1996);12:360–361. [PubMed: 8630486]
  45. Robertson MP, Ellington AD. How to make a nucleotide. Nature. (1998);395:223–225. [PubMed: 9751043]
  46. Romero D, Palacios R. Gene amplification and genomic plasticity in prokaryotes. Ann. Rev. Genet. (1997);31:91–111. [PubMed: 9442891]
  47. Stavenhagen JB, Robins DM. An ancient provirus has imposed androgen regulation on the adjacent mouse sex limited protein gene. Cell. (1988);55:247–254. [PubMed: 3167981]
  48. Strachan T and Read AP (1999) Human Molecular Genetics, 2nd edition. BIOS Scientific Publishers, Oxford.
  49. Szathmáry E. Coding coenzyme handles: a hypothesis for the origin of the genetic code. Proc. Natl Acad. Sci. USA. (1993);90:9916–9920. [PMC free article: PMC47683] [PubMed: 8234335]
  50. Szathmáry E, Maynard Smith J. The origin of chromosomes. II. Molecular mechanisms. J. Theoret. Biol. (1993);164:447–454. [PubMed: 7505372]
  51. Takahata N. A genetic perspective on the origin and history of humans. Ann. Rev. Ecol. System. (1995);26:343–372.
  52. Taylor JS, Van de Peer Y, Meyer A. Genome duplication, divergent resolution and speciation. Trends Genet. (2001);17:299–301. [PubMed: 11377777]
  53. Unrau PJ, Bartel DP. RNA-catalysed nucleotide synthesis. Nature. (1998);395:260–263. [PubMed: 9751052]
  54. Vision TJ, Brown DG, Tanksley SD. The origins of genomic duplications in Arabidopsis. Science. (2000);290:2114–2117. [PubMed: 11118139]
  55. Wächtershäuser G. Before enzymes and templates: theory of surface metabolism. Microbiol. Rev. (1988);52:452–484. [PMC free article: PMC373159] [PubMed: 3070320]
  56. Wächtershäuser G. Groundworks for an evolutionary biochemistry - the iron sulfur world. Prog. Biophys. Mol. Biol. (1992);58:85–201. [PubMed: 1509092]
  57. Wagner A. Birth and death of duplicated genes in completely sequenced eukaryotes. Trends Genet. (2001);17:237–239. [PubMed: 11335019]
  58. Walter P, Keenan R, Schmitz U. SRP - where the RNA and membrane worlds meet. Science. (2000);287:1212–1213. [PubMed: 10712156]
  59. Woese CR. A proposal concerning the origin of life on the planet Earth. J. Mol. Evol. (1979);13:95–101. [PubMed: 480373]
  60. Wolfe KH, Shields DC. Molecular evidence for an ancient duplication of the entire yeast genome. Nature. (1997);387:708–713. [PubMed: 9192896]
  61. Yunis JJ, Prakash O. The origin of Man: a chromosomal pictorial legacy. Science. (1982);215:1525–1530. [PubMed: 7063861]
  62. Zietkiewicz E, Richer C, Makalowski W, Jurka J, Labuda D. A young Alu subfamily amplified independently in human and African great apes lineages. Nucleic Acids Res. (1994);22:5608–5612. [PMC free article: PMC310123] [PubMed: 7838713]
  63. Zuckerkandl E. Gene control in eukaryotes and c-value paradox: ‘excess’ DNA as an impediment to transcription of coding sequences. J. Mol. Evol. (1976);9:73–104. [PubMed: 798041]

Further Reading

  1. Futuyama DJ (1998) Evolutionary Biology, 3rd edition. Sinauer, Sunderland, MA. —An accessible description of evolutionary biology.
  2. Gesteland RF, Cech TR and Atkins JF (eds) (1999) The RNA World, 2nd edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor.
  3. Jackson M, Strachan T and Dover GA (1996) Human Genome Evolution. BIOS Scientific Publishers, Oxford. —An advanced treatment of the subject.
  4. Li W-H (1997) Molecular Evolution. Sinauer, Sunderland, MA. —Detailed descriptions of many of the topics covered in this chapter.
  5. Maynard Smith J and Szathmáry E (1995) The Major Transitions in Evolution. WH Freeman, Oxford. —A remarkable book that begins with the origin of life and ends with the evolution of human language.
  6. Otto SP, Whitton J. Polyploid incidence and evolution. Ann. Rev. Genet. (2000);34:401–437. [PubMed: 11092833]
Image permission
Image ch11f11
Image ch5f15
Image ch14f5
Image ch2f14
Image ch12f30
Image ch2f29
Copyright © 2002, Garland Science.
Bookshelf ID: NBK21112


Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...