NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Strachan T, Read AP. Human Molecular Genetics. 2nd edition. New York: Wiley-Liss; 1999.

Cover of Human Molecular Genetics

Human Molecular Genetics. 2nd edition.

Show details

Chapter 14Our place in the tree of life

By definition, the evolutionary origins of the human genome, and all genomes, are as old as life itself. The present chapter is not intended as an introduction to basic evolutionary theory or an overview of molecular evolutionary genetics per se. As a result, many fascinating areas are not covered, such as the idea that RNA used to be the primary information molecule before being superseded by DNA or the idea that the genetic code was initially a doublet code before evolving into the familiar triplet code, etc. The interested reader is advised to consult one of the more general molecular evolutionary genetics textbooks (see Further reading). Instead, this chapter is meant to focus on how comparative analyses of present day genomes have shed light on the evolutionary origin of human DNA. Much of the data are derived from comparisons of mammalian genomes, although comparison with more distant genomes is occasionally used to explain certain footprints of evolution, as in the origin of mitochondrial DNA and introns. A final section considers our recent evolutionary past and our uniqueness when compared with mammalian models, notably the mouse (an important model for understanding early human development and also human disease - see Chapter 21) and primates, our closest living relatives.

14.1. Evolution of the mitochondrial genome and the origin of eukaryotic cells

14.1.1. The mitochondrial genome most likely originated as a result of endocytosis of a prokaryotic cell by a eukaryotic cell precursor, but the nature of these cells is uncertain

In addition to the nuclear genome, mitochondria have a genome, as do the chloroplasts of plant cells. The organization and expression of mitochondrial and chloroplast genomes shows considerable similarities to that of prokaryotic cells (see below), suggesting that the evolution of eukaryotic cells involved a process in which a precursor to the eukaryotic cell (protoeukaryote) engulfed (endocytosed) some type of prokaryotic cell (the symbiont). Such a process is thought to have conferred a selective advantage for the resulting new cell and has been termed endosymbiosis.

The genomes of the engulfed prokaryotic cell and the host cell are envisaged to have given rise to the present day mitochondrial and nuclear genomes respectively (Figure 14.1A). However, the size of the mitochondrial genome of present day cells, especially in the case of animal cells, is very small. For example, the human mitochondrial genome is about 16 kb and has only 37 genes; the vast majority of mitochondrial proteins and functions are specified by genes in the nuclear genome (Section 7.1.1). Present day prokaryotic cells comprise two quite different types of cell: eubacteria and archaea (Box 14.1). They typically contain one or a few Mb of DNA and contain several hundreds or thousands of genes. As present day mitochondrial genomes are very much smaller than this, theories of mitochondrial genome origin envisage that many of the genes originally present in the engulfed cell were transferred to the genome of the host cell (Figure 14.1B; Doolittle, 1998). Currently there are two classes of hypothesis to explain the origin of mitochondria and the eukaryotic cell, exemplified by the serial endosymbiont hypothesis and the hydrogen hypothesis.

Figure 14.1. The human mitochondrial genome probably originated following endocytosis of a prokaryotic cell by a eukaryotic precursor cell.

Figure 14.1

The human mitochondrial genome probably originated following endocytosis of a prokaryotic cell by a eukaryotic precursor cell. (A) General endocytosis model whereby the mitochondrial genome is imagined to originate by endocytosis of a prokaryotic cell. (more...)

Box Icon

Box 14.1

The three kingdoms of life. The subdivision of cells into prokaryotes and eukaryotes as described in Section 2.1.1 is a simplification. During the last two decades or so, it has become clear that there are three kingdoms of life, the eukaryotes and two (more...)

The serial endosymbiont hypothesis

The first version of the endosymbiont hypothesis was developed 30 years ago and was invoked to explain several features of the organization and expression of the mitochondrial genome which bore a resemblance to prokaryote genomes: small size, absence of introns, a very high percentage of coding DNA, a conspicuous lack of repeated DNA sequences and comparatively small prokaryotic-like rRNA genes. Phylogenetic analyses of rRNA sequences suggested that mitochondria were particularly closely related to the α subdivision of purple bacteria. Consequently, mitochondria were believed to have originated as a result of endocytosis by anaerobic eukaryotic precursor cells of an aerobic eubacterium of this type (with an oxidative phosphorylation system). This was imagined to have occurred about 1.5 billion years ago when oxygen started to accumulate in significant quantities in the Earth's atmosphere; cells which acquired the capacity for oxidative phosphorylation would have been at a strong selective advantage.

More recently, it has become clear that the nuclear genome of eukaryotes is an evolutionary chimera which is genetically related to both archaeal and eubacterial genomes. For example, eukaryotic genes involved in information transfer (replication, transcription, translation, etc.) are largely derived from archaea; however, operational genes (those involved in metabolism and biosynthesis of cofactors, amino acids, fatty acids, etc.) appear to be descended from eubacteria (Rivera et al., 1998). The modern serial endosymbiont hypothesis envisages that eukaryotic cells evolved by a series of endocytoses. What is not clear in this model is when the prokaryotic cell containing the precursor mitochondrial genome was endocytosed, whether by a nucleated cell resembling a eukaryote, or by an archaeon (see Lopez-Garcia and Moreira, 1999 and Gray et al., 1999 for references).

The hydrogen (syntrophy) hypothesis

The serial endosymbiont model considers that the precursor to the nuclear genome arose first and subsequently a precursor to the mitochondrial genome was captured by endocytosis, maybe even from a nucleated host cell. The hydrogen hypothesis, by contrast, considers that there could have been a simultaneous origin for the precursors of the nuclear and mitochondrial genomes, and that the respiration of ancestral mitochondria was anaerobic Martin and Muller, 1998; Lopez-Garcia and Moreira, 1999). In this hypothesis eukaryotes are suggested to have arisen by association of an anaerobic strictly autotrophic archaeon which was hydrogen-requiring (possibly a methanogen) with a hydrogen-producing eubacterium, such as an α-proteobacterium. The anticipated driving force for this association was a symbiotic metabolic association (syntrophy). Subsequently, to avoid pointless cycling of metabolites in its cytoplasm the host lost its autotrophic pathway and an irreversible heterotroph emerged containing ancestral mitochondria but no longer dependent on hydrogen. More efficient oxigenic respiration was then adopted by many such organisms and aerobic mitochondria evolved.

The hydrogen hypothesis was founded on several important observations. First, eukaryotes which do not possess mitochondria possess eubacterial-like metabolic enzymes (in addition to other known eubacterial-like genes). Secondly, hydrogenosomes (small membrane-bounded hydrogen-producing organelles found in some anaerobic protozoa and some types of fungi) appear to share a common ancestry with mitochondria (Gray et al., 1999; Lopez-Garcia and Moreira, 1999). It is perhaps significant, too, that whereas most eukaryotes use histones to compact their nuclear DNA, the only prokaryotes that have histones and nucleosomes are the Euryarchaeota, the division of the Archaea that includes the hydrogen-consuming methanogens. This field is a fast-moving one and so interested readers are advised to consult recent review articles.

14.1.2. The mitochondrial genetic code most likely evolved as a result of reduced selection pressure in response to a greatly diminished coding capacity

The human mitochondrial genetic code is slightly different from the ‘universal’ genetic code that is used in the expression of polypeptides encoded by prokaryotic genomes, eukaryotic nuclear genomes and plant mitochondrial genomes (Figure 1.22). In addition, although it is identical to the genetic codes of other mammalian mitochondrial genomes, it shows some differences to the nonuniversal genetic code in the mitochondria of other eukaryotes, such as Drosophila and yeast cells. As described in the preceding section, theories of mitochondrial origins have envisaged that genes were transferred from the precursor mitochondrial genome to the nuclear genome. This may have occurred by successive processes of organelle lysis with incorporation of DNA into the nuclear genome, loss of organelle copies and fixation of gene loss by genetic drift (Doolittle, 1998). By whatever mechanism, the loss in genetic capacity could have relaxed the normal selection pressure that applies to large genomes.

From the above, one would expect that the original genome of the endocytosed prokaryotic cell would, like all large genomes, have been subject to strong conservative selection pressure. The universal genetic code would have been used because even slight alterations in the code could result in lack of function (or aberrant function) for large numbers of vitally important gene products, resulting in cell death. However, as the coding potential steadily diminished by gene transfer to the nuclear genome, there would have been progressively less selection pressure to conserve the original genetic code. Eventually a severely depleted genome resulted (only 13 genes in the human mitochondrial genome encode polypeptides). Slight altering of the otherwise universal genetic code could therefore be achieved without provoking disastrous consequences, because only a tiny number of polypeptides would be involved. It is also likely that the codons which have been altered (see Figure 1.22) have not been used extensively in locations where amino acid substitutions would have been deleterious.

14.2. Evolution of the eukaryotic nuclear genome: genome duplication and large-scale chromosomal alterations

As described in Section 14.1.1 the nuclear genome of eukaryotes is thought to have initially evolved as a mixture of archaeal genes (involved in information transfer) and eubacterial genes (involved in metabolism and other basic cellular functions). As eukaryotes developed into complex multicellular organisms, the number of genes and size of the nuclear genome increased and various other properties were altered, notably the amount of repetitive DNA and the fraction of coding DNA (see Table 14.1). The transition from the DNA of a typical simple eukaryotic cell precursor to the DNA of a human cell is therefore thought to have involved a huge increase in the size of the genome and a sizeable increase in gene number and in the percentage of noncoding and repetitive DNA. Different mechanisms have been envisaged to contribute to the large increase in genome size:

Table 14.1. Differences in DNA organization in the cells of simple and complex eukaryotes.

Table 14.1

Differences in DNA organization in the cells of simple and complex eukaryotes.

  • Rare duplications of the whole genome.
  • Frequent subgenomic duplication events resulting in gene and exon duplication. Often such events occur at the subchromosomal level as a result of unequal crossover or unequal sister chromatid exchange, but interchromosomal exchanges are not uncommon, including retrotransposition, translocations and largescale duplicative transpositions (see Section 14.2.3).
  • Frequent subgenomic duplication events leading to increase in the amount of noncoding DNA. An increase in the amount of noncoding DNA separating exons and genes is thought to have occurred principally by retrotransposition of repetitive elements such as Alu and LINE1 sequences.

In each case, the increase in genome size must have been accomplished without initially compromising the functions of the original DNA set. Instead, by providing additional genes, subsequent mutations could result in comparatively rapid sequence divergence: at each duplicated gene locus, one gene is surplus to requirements and so can diverge rapidly because of the absence of selection pressure to conserve function. In some cases, such diverged genes may have acquired novel functions which could be selectively advantageous. In many cases, however, the additional gene sequences would be expected to acquire deleterious mutations and degenerate into nonfunctional pseudogenes (Figure 14.2).

Figure 14.2. Gene duplication can lead to the acquisition of novel function or the formation of a pseudogene.

Figure 14.2

Gene duplication can lead to the acquisition of novel function or the formation of a pseudogene. Duplication of gene A results in two equivalent gene copies. Selection pressure need be applied to only one gene copy (top) to maintain the presence of the (more...)

14.2.1. Human genome evolution may have involved ancient genome duplication events, but the evidence has been obscured by subsequent chromosome and DNA rearrangements

Genome duplication (tetraploidization) is an effective way of increasing genome size and is responsible for the extensive polyploidy of many flowering plants. It can occur naturally when there is a failure of cell division after DNA duplication, so that a cell has double the usual number of chromosomes. Human somatic cells are normally diploid. However, if there is a failure of the first zygotic cell division, constitutional tetraploidy can result. Tetraploidy and other forms of polyploidy can be harmful and is often selected against. However, whole genome duplication via polyploidy has undoubtedly occurred relatively recently in maize, yeast, Xenopus and some types of fish. It is likely therefore that genome duplication occurred several times in the evolution of all eukaryotic lineages, including our own. Following genome duplication, an initially diploid cell could have undergone a transient tetraploid state; subsequent large-scale chromosome inversions and translocations, etc., could result in chromosome divergence and restore diploidy, but now with twice the number of chromosomes (Figure 14.3).

Figure 14.3. Genome duplication can lead to a transient tetraploid state before chromosome divergence restores diploidy.

Figure 14.3

Genome duplication can lead to a transient tetraploid state before chromosome divergence restores diploidy. Following duplication of a diploid genome, each pair of homologous chromosomes (e.g. chromosome 1) is now present as a pair of identical pairs. (more...)

If ancient tetraploidization events were rare in the evolution of the human genome, much intragenomic DNA shuffling would have occurred since the last such event. This means that the original evidence for tetraploidization events would be very largely obscured by subsequent chromosomal inversions, translocations, etc. Additionally, traces of gene duplication following genome duplication are likely to be frequently reduced by silencing of one member of each duplicated gene pair which then degenerates into a pseudogene. After hundreds of millions of years without any function, the nonprocessed pseudogenes generated following the last proposed genome duplication would have diverged so much in sequence as to be not recognizably related to the functional gene, even assuming they have not been lost during occasional rearrangements leading to gene deletion.

Genome duplication events during vertebrate evolution

In the case of vertebrates, two rounds of genome duplication have been envisaged at an early stage of vertebrate evolution, but the current evidence is fragmentary and its significance has been questioned (Skrabaneck and Wolfe, 1998). Gene numbers in different species have been taken to provide some evidence for two rounds of tetraploidization in vertebrates: invertebrates such as C. elegans, Drosophila and the sea squirt Ciona intestinalis are estimated to have about 15 000–20 000 genes, about one quarter that expected in mammalian genomes. In addition, many single-copy Drosophila genes have four vertebrate homologues and certain gene clusters appear to have been quadruplicated (see next section).

14.2.2. The existence of some paralogous chromosome segments has been alternatively viewed to reflect ancient genome duplications or subgenomic duplications

A major line of evidence for genome duplications in vertebrates is the existence of closely related gene clusters at different subchromosomal regions in a species, so-called paralogous chromosome segments (Box 14.2 and Figure 14.4). Often such clusters contain genes that have been very highly conserved during evolution because they play crucial roles in early embryonic development. Some examples of quadruplicated gene clusters are known in the human genome and they have been taken as evidence of previous genome duplications. They include clusters containing fibroblast growth factor receptor genes and HOX genes (Skrabaneck and Wolfe, 1998).

Box Icon

Box 14.2

Paralogy, orthology and homology. Paralogy means close similarity of nonallelic chromosomal segments or DNA sequences within a species, indicative of a close evolutionary relationship which may or may not have predated speciation. For example, the two (more...)

Figure 14.4. HSA12 and HSA17 appear to have paralogous chromosomal segments.

Figure 14.4

HSA12 and HSA17 appear to have paralogous chromosomal segments. The approximate positions of some of the related genes mapping to human chromosomes 12 and 17 are indicated. The lengths of the bars mark the maximum uncertainty about the position of any (more...)

HOX gene clusters

The most cited example of gene organization supporting two rounds of vertebrate genome duplication are the HOX genes, homeobox genes which are involved in specifying the anterior-posterior axis during early development. Humans and other mammals have four HOX gene clusters containing about nine such genes (Figure 14.5). The linear order of the genes in a cluster is thought to dictate the temporal order in which they are expressed during development and also their anterior limits of expression along the anterior-posterior axis (Ruddle et al., 1994). The close similarity of the four clusters means that there is clear evidence for paralogous HOX genes, that is genes on different clusters which are more closely related to each other than they are to their neighbors (Figure 14.5). A single such HOX cluster exists in Amphioxus, with very close similarities to the mammalian HOX gene clusters (Garcia-Fernandez and Holland, 1994) As Amphioxus is thought to be the closest invertebrate relative of the vertebrates, the vertebrate ancestor may have had a single such cluster.

Figure 14.5. The organization of HOX gene clusters in mammals and Amphioxus suggests the possibility of one or two rounds of ancestral genome duplication.

Figure 14.5

The organization of HOX gene clusters in mammals and Amphioxus suggests the possibility of one or two rounds of ancestral genome duplication. Indicated paralogous groups consist of genes with very similar expression patterns and presumably similar functions. (more...)

While the evidence above is consistent with two successive rounds of genome duplication during vertebrate evolution, contradictory evidence also exists. Analysis of the collagen genes which are closely linked to HOX clusters has suggested that the HoxD cluster branched off first from the ancestral lineage, followed by the HoxA cluster and finally HoxB/HoxC. This would require three separate duplication events and it may suggest that some of these steps were subgenomic duplications rather than whole genome duplications. Analysis of Hox clusters in other species also offers some support for subgenomic duplication. While the pufferfish has the expected four clusters, lamprey have only three such clusters. So either there has been some subgenomic duplication events, or whole clusters have been deleted. The latter possibility may also be suggested by the observation of seven clusters in zebrafish (Meyer and Malago-Trillo, 1999). Zebrafish may have undergone an additional recent genome duplication (as suggested by the presence of additional gene copies for many other types of gene) with subsequent loss of a single cluster.

14.2.3. There have been numerous major chromosome rearrangements during the evolution of mammalian genomes

In addition to whole genome duplication, a variety of different subgenomic DNA duplication events are possible, resulting from exchanges between nonhomologous chromosomes (chromosomal translocations), unequal exchanges between homologous chromosomes or the sister chromatids of a single chromosome, and DNA copy transposition events. Clearly, some of these mechanisms can also result in loss of genetic material. In addition, other mechanisms (chromosome inversions, simple DNA transpositions and balanced translocations) can result in no net gain or loss of material.

Mammalian genome evolution may have involved frequent subgenomic duplications and also rearrangements without net loss of DNA (Lundin, 1993). Small scale duplications can be expected to have occurred intrachromosomally by mechanisms such as unequal crossover and unequal sister chromatid exchange but interchromosomal duplicative transposition events may have been common too, where a segment of a chromosome duplicates and the copy is inserted elsewhere in the genome. For example, the pericentromeric regions of chromosomes are known to be unstable and duplications of pericentromeric regions followed by insertion into other chromosomes may be frequent (Eichler, 1998). Other such examples could have occurred on a larger scale from a variety of different chromosomal regions. Even larger scale duplications could have resulted from ancestral whole chromosome duplications by Robertsonian fusion or subchromosomal duplications followed by pericentric inversions.

Comparisons of the present-day genome organization of humans and other mammals also suggest that large-scale rearrangements may have been frequent, and that karyotype and phenotype evolution can be uncoupled. For example, the Indian muntjac deer (Muntiacus muntjak) has only three types of (very large) chromosome, whereas its very close relative, the Chinese muntjac deer (Muntiacus reevesi), has 23 different chromosomes. The human and mouse karyograms are also quite different from each other and even the highly conserved X chromosome linkage group shows numerous differences in organization between the species (see Figure 14.10). The great apes are extremely closely related to humans but show clear cytogenetic differences as a result of several inversions, a translocation that has occurred exclusively in the human lineage and another that has occurred in the gorilla lineage (see Figure 14.26). Old World monkeys are also closely related to humans but, with the exception of the gibbons, numerous chromosome rearrangements have occurred since divergence from the human lineage.

Figure 14.10. Several X chromosome inversions appear to have occurred since human-mouse divergence.

Figure 14.10

Several X chromosome inversions appear to have occurred since human-mouse divergence. The present-day organization of human and mouse X chromosomes is shown at the bottom. A minimum of eight different homology blocks are defined by the presence in each (more...)

Figure 14.26. Human chromosome banding patterns are very similar to those of the great apes.

Figure 14.26

Human chromosome banding patterns are very similar to those of the great apes. The ideograms represent selected primate chromosomes from 1000-band late prophase preparations. Human (H) chromosomes 2, 5 and 6 are illustrated together with the corresponding (more...)

14.3. Evolution of the human sex chromosomes

14.3.1. Despite their considerable structural differences, substantial blocks of sequence homology between the human X and Y chromosomes suggest a common origin

In mammals, pairs of homologous autosomal chromosomes are structurally virtually identical (homomorphic); chromosome pairing at meiosis is presumed to be facilitated by the high degree of sequence identity between homologs, albeit by a mechanism that is not understood. By contrast, the X and Y chromosomes of humans and other mammalian species are heteromorphic. The human X chromosome is a submetacentric chromosome which contains about 165 Mb of DNA, whereas the Y is acrocentric and is much smaller (containing about 60 Mb of DNA). The human X chromosome contains numerous important genes: on the basis of its size alone, it might be expected to contain about 4000 genes, but the comparative lack of CpG islands on the X chromosome (see Figure 7.4) indicates that the true figure may be substantially smaller. In marked contrast, the great bulk of the Y chromosome is genetically inert and is composed of constitutive heterochromatin consisting of different types of highly and moderately repetitive noncoding DNA. Only a very few functional genes are found on the Y chromosome, including some which have closely related homologues on the X chromosome, and several which are Y-specific and testis-expressed (Figure 14.6).

Figure 14.6. The great bulk of the Y chromosome is genetically inert but many of its few genes show testis-specific expression.

Figure 14.6

The great bulk of the Y chromosome is genetically inert but many of its few genes show testis-specific expression. (A) Schematic illustration of the Y chromosome showing major and minor pseudoautosomal regions (the 2.6 Mb PAR1 and 320 kb PAR2 respectively) (more...)

Despite being morphologically distinct, however, the X and Y chromosomes are able to pair during meiosis in male cells, and to exchange sequence information. Sequence exchanges occur within certain small regions of homology between the X and the Y chromosomes, known as pseudoautosomal regions because DNA sequences in these regions do not show strict sex-linked inheritance. In the human sex chromosomes, there are two pseudoautosomal regions (Rappold, 1993; Ried et al., 1998):

  • The major pseudoautosomal region (PAR1) extends over 2.6 Mb at the extreme tips of the short arms of the X and Y and is known to contain a dozen or so genes. It is the site of an obligate crossover during male meiosis which is thought to be required for correct meiotic segregation. This very small region is remarkable for its high recombination frequency (the sex-averaged recombination frequency is 28% which, for a region of only 2.6 Mb, is approximately 10 times the normal recombination frequency). The high figure is, of course, mostly due to the obligatory crossover in male meiosis resulting in a crossover frequency approaching 50% (Figure 14.7). The boundary between the major pseudoautosomal region and the sex-specific region has been shown to map within the XG blood group gene, with the SRY male determinant gene occurring only about 5 kb from the boundary on the Y chromosome (Figure 14.8).
  • The minor pseudoautosomal region (PAR2) extends over 320 kb at the extreme tips of the long arms of the X and Y. Unlike the major pseudoautosomal region, crossover between the X and Y in this region is not so frequent, and is neither necessary nor sufficient for successful male meiosis. At the time of writing only two genes had been identified in this region: IL9R and SYBL1.

Figure 14.7. The major human pseudoautosomal region is characterized by a high overall recombination frequency and a large sex difference in recombination frequency.

Figure 14.7

The major human pseudoautosomal region is characterized by a high overall recombination frequency and a large sex difference in recombination frequency. The frequency of recombination with sex (i.e. of exchange between the X and Y chromosomes in male (more...)

Figure 14.8. The boundary of the major human pseudoautosomal region occurs within the XG blood group gene.

Figure 14.8

The boundary of the major human pseudoautosomal region occurs within the XG blood group gene. The human major pseudoautosomal region (PAR1) is a 2.6 Mb region that is common to the tip of the short arms of the X chromosome (top) and Y chromosome (bottom). (more...)

In addition to the two pseudoautosomal regions, the human X and Y chromosomes show substantial regions of homology elsewhere, including a variety of Xp-Yq and Xq-Yp homologies, as well as Xp-Yp and Xq-Yq homologies. The existence of such homologies suggests that the two chromosomes have evolved from an ancestral homomorphic pair of chromosomes. Clearly, the two chromosomes have subsequently undergone substantial divergence, and sequences that are physically close on one chromosome may have very widely spaced counterparts on the other (Figure 14.9). However, primate comparisons have shown that at least some of the existing X-Y homology results from very recent duplicative transposition events.

Figure 14.9. The human X and Y chromosomes show several regions of homology in addition to the common pseudoautosomal regions.

Figure 14.9

The human X and Y chromosomes show several regions of homology in addition to the common pseudoautosomal regions. Note the contrasting spatial organization between different pairs of homology blocks. For example, a sequence block in the Yq11.21 region (more...)

14.3.2. The evolution of the mammalian X chromosome has led to substantial species differences, both in chromosomal DNA organization and the pattern of X inactivation

Human-mouse divergence in gene order

Conservation of synteny between mouse and human is most pronounced in the case of the X chromosome: almost the entire X linkage group appears to be conserved between the two species (known exceptions include three human pseudoautosomal genes with autosomal orthologs in mouse; see below). This remarkable conservation of synteny for X-linked genes also applies to other mammals and appears to be evolutionarily related to the development of a special form of dosage compensation: X chromosome inactivation (see Sections 2.2.3 and 8.5.6). Once established by evolutionary design, or accident (Ohno, 1973), X chromosome inactivation would be expected to ensure conservation of synteny because X-autosome translocations would be selected against (the normal 2:1 ratio of gene dosage for autosomal and X-linked genes would be destroyed). As expected, there is extensive conservation of synteny of the genes on the mouse and human X chromosomes. Nevertheless, there are major differences in gene order: fine mapping of X-linked DNA sequences in the two species indicates regions of homology which can only have been generated by a variety of different chromosomal inversions in the lineages leading to present day mice and humans (Figure 14.10).

Evolutionary instability of pseudoautosomal regions

The pseudoautosomal regions have not been well-conserved in evolution. There is no equivalent to the human PAR2 in mouse and various other mammals, including primate species. Of the two human genes known to be located in PAR2, IL9R has a mouse ortholog which is autosomal while the mouse ortholog of SYBL1 lies close to the centromere of the X (Figure 14.11). There are also major species differences in the position of the boundary for the major pseudoautosomal region and in the gene content of this region. In humans the major pseudoautosomal boundary occurs within the XG gene (Figure 14.8). In mouse, however, the boundary lies within the Fxy gene, whose human ortholog FXY is located more proximally within the X chromosome-specific region (Figure 14.11; see Perry et al., 1998).

Figure 14.11. Mammalian pseudoautosomal regions have not been well-conserved during evolution.

Figure 14.11

Mammalian pseudoautosomal regions have not been well-conserved during evolution. The major pseudoautosomal region is a target for X-autosome translocations with periodic additions to it from autosomal sequences during evolution. It is also a region characterized (more...)

Several of the genes in the human major pseudoautosomal region are autosomal in other species. This is consistent with the idea that there has been repeated addition of autosomal segments onto the pseudoautosomal regions of the X or the Y chromosome which are then recombined onto the other sex chromosome (Graves et al., 1998; see also Section 14.3.3). The major pseudoautosomal region and neighboring regions are also thought to be comparatively unstable regions. Frequent DNA exchanges result in a high incidence of gene fusions, exon duplications and exon shuffling (see Ried et al., 1998 and Sections 14.5.1 and 14.5.2). Genes within or close to the major pseudoautosomal region are also subject to rapid sequence evolution. The SRY gene located only 5 kb from the major pseudoautosomal boundary is very poorly conserved in evolution (Pamilo and O'Neil, 1997) and standard hybridization-based screening has failed to find mouse orthologs for many of the human genes that map within or close to the major pseudoautosomal region, presumably because of very high levels of sequence divergence (see also Section 14.6.1).

Human-mouse divergence in X inactivation patterns

The rationale for X chromosome inactivation is to act as a dosage compensation mechanism for those X chromosome genes (the vast majority) which do not have homologs on the Y chromosome. However, a small minority of human X-linked genes do have functional homologs on the Y chromosome. Such genes, which are common to both the X and the Y, do not show a sex difference in dosage; as a result, they would be expected to escape X inactivation. Those genes in the major pseudoautosomal region which have been tested for X inactivation status, have all been shown to be expressed on both active and inactive X chromosomes. In the minor pseudoautosomal region, the IL9R gene escapes X inactivation but surprisingly the SYBL1 gene is subject to X inactivation and is not easily accommodated in proposed schemes for explaining how genes common to the X and Y came to be inactivated (Jegalian and Page, 1998). When present on the Y chromosome, SYBL1 is methylated and not expressed. This type of Y-inactivation constitutes a novel way of maintaining equality in gene dosage between the two sexes.

In addition to the genes in the pseudoautosomal regions, several other human X-linked genes are known to escape X inactivation, including genes which map to proximal Xp and proximal Xq regions, while genes known to map to intermediate locations are often subject to X inactivation (Figure 14.12). In some cases where detailed gene mapping has been conducted, clear evidence exists for multigene domains outside the pseudoautosomal region which escape X inactivation, such as one at Xp11.2 (Miller and Willard, 1998). As expected, many of the human genes which map outside the major pseudoautosomal region and escape X inactivation have functional homologs on the Y chromosome. Some, however, do not. For example, the UBE1 and SB1.8 genes escape inactivation but do not appear to have any homologs on the Y chromosome. Other genes such as the Kallman syndrome gene KAL1, and the steroid sulfatase gene STS do have homologs on the Y chromosome, but these are nonfunctional pseudogenes. It is likely, therefore, that for some genes sex difference in gene dosage is not a problem and is tolerated (Disteche, 1995). In the mouse, however, there are considerable differences in the pattern of X inactivation. For example, the human nonpseudoautosomal genes ZFX, RPS4X and UBE1 all escape inactivation, but the murine homologs Zfx, Rps4 and Ube1X (which unlike the human UBE1 gene has a homolog on the Y) are all subject to X inactivation (Figure 14.12).

Figure 14.12. Genes that escape inactivation on the human X chromosome are widely distributed, but there are some notable differences in X-inactivation patterns in the mouse.

Figure 14.12

Genes that escape inactivation on the human X chromosome are widely distributed, but there are some notable differences in X-inactivation patterns in the mouse. Genes that escape inactivation are shown in blue, those that are subject to inactivation are (more...)

14.3.3. Comparison of genes in distantly related mammals suggests that much of the short arm of the human X chromosome has recently been acquired by X-autosomal translocation

Mammals are classified into two subclasses prototheria (the monotremes or egg-laying mammals) and theria which in turn are subdivided into two infraclasses: metatheria (marsupials) and eutheria, a group which includes placental mammals (Figure 14.13). Many eutherian X-linked genes are found to be X-linked in marsupials. However, genes mapping to a large part of human Xp (distal to Xp11.3) have orthologs on autosomes of both marsupials and monotremes. Because the prototherian divergence pre-dated the metatherian-eutherian divergence, the simplest explanation is that at least one large autosomal region was translocated to the X chromosome early in the eutherian lineage.

Figure 14.13. Mammalian phylogeny.

Figure 14.13

Mammalian phylogeny. Numbers to the right refer to the approximate dates of divergence of the indicated lineages in millions of years. For example, the lineage giving rise to modern day humans is thought to have diverged from chimpanzees about 5.5 million (more...)

Translocation of autosomal genes on to the X chromosome will result in the formerly autosomal genes being subject to X inactivation. Not only is one X chromosome shut down in all female cells, but inactivation of the single X chromosome in male cells is required during spermatogenesis. However, certain genes on human Xp would be expected to be crucially important for cell function. For example, the PDHA1 gene encodes the E1α subunit of the pyruvate dehydrogenase complex, an enzyme essential in aerobic energy metabolism. In marsupials the PDHA1 gene is located on an autosome and so is expressed during spermatogenesis. In contrast, the PDHA1 gene in humans (and other eutherian mammals) is X-linked, and is not expressed during spermatogenesis. However, a closely related gene, PDHA2, encodes a testis-specific human isoform which presumably has evolved in response to the silencing of X-linked genes during spermatogenesis. The PDHA2 gene is intron-less and is thought to be an example of a functional processed gene, generated by reverse transcription from the mRNA of the PDHA1 gene (see Figure 7.13 for the general mechanism).

14.3.4. Enforced lack of recombination has led to a severe loss of genetic capacity on the mammalian Y chromosome and thereafter to the development of the X-inactivation mechanism of dosage compensation

The heteromorphic mammalian sex chromosomes most likely evolved from homomorphic autosomes

Distinct sex chromosomes have been independently developed in many animals with disparate evolutionary lineages, including not only mammals, but birds (where the females are ZW, the heterogametic sex, and the males are ZZ, the homogametic sex), and certain species of fish, reptiles and insects. In each case, it is thought that the two sex chromosomes started off as virtually identical autosomes, except that one of them happened to evolve a major sex-determining locus (the SRY locus in humans; see Figure 14.14). Subsequent evolution resulted in the two chromosomes becoming increasingly dissimilar until, in many species, the Y was reduced to a tiny chromosome with only a very few functional genes (see Figure 14.6). There would appear to be evolutionary pressure to adopt the strategy of having two structurally and functionally different sex chromosomes, and it seems that this pressure is gradually driving the Y chromosome to extinction. Eventually, one would expect a switch to a sex determination system where maleness is conferred simply by X : autosome gene dosage and XO individuals are male, as in the case of Drosophila.

Figure 14.14. Mammalian sex chromosomes most likely evolved from a pair of autosomes, one of which acquired a sex-determining allele, leading to recombination suppression and chromosome differentiation.

Figure 14.14

Mammalian sex chromosomes most likely evolved from a pair of autosomes, one of which acquired a sex-determining allele, leading to recombination suppression and chromosome differentiation. One of a homologous pair of autosomes in an ancestral genome is envisaged (more...)

Why should the X and Y chromosomes diverge, and why should the Y degenerate?

Clearly to maintain sex differences, recombination needs to be suppressed in the region of the major sex-determining locus (SRY in humans is located in a nonrecombining region just 5 kb proximal to the major pseudoautosomal region; see Figure 14.8). Additionally, environmental circumstances may have offered a selective advantage for breaking down recombination between the sex chromosomes. For example, one trigger could have been the development of sexually antagonistic genes, with alleles which may be of benefit to the heterogametic sex (XY), but harmful to the homogametic sex (XX). If such genes accumulate, then there will be a selective pressure to ensure that they are not transmitted to the homogametic sex, a restriction which can be met if they are present on a nonrecombining Y chromosome. Certainly, recombination between the present-day human X and Y chromosomes is very limited, being very largely confined to the tiny major pseudoautosomal region at the tips of the short arms (PAR1).

From the above the human Y chromosome can be viewed as an essentially asexual (nonrecombining) component within an otherwise sexual genome (the X chromosome can recombine along its length with a fully paired homolog in female meiosis). Population genetics predicts that a nonrecombining chromosome should degenerate by a process known as Muller's rachet. If the mutation rate is reasonably high the absence of mutations means that harmful mutations can gradually accumulate in genes on that chromosome over long evolutionary time scales: mutant alleles may drift to fixation as Y chromosomes with fewer mutants are lost by chance, or they may ‘hitchhike’ along with a favorable allele in a region protected from recombination. Once mutations accumulate in the nonrecombining Y, the loss of function of genes means that there is no selective pressure to retain that DNA segment and the chromosome will gradually contract by a series of deletions (Figure 14.14).

Y chromosome degeneration and the development of X chromosome inactivation

The evolution of the mammalian sex determination system shown in Figure 14.14 is also inextricably interwoven with the evolution of the X-inactivation mechanism for dosage compensation (see Ellis, 1998). In response to large-scale destruction of Y chromosome sequences by the process described above there would have been pressure to increase gene expression on the X chromosome. However, this would lead to excessive X chromosome gene expression in females which could cause reduced fitness. As a result a form of gene dosage compensation evolved whereby a single X chromosome was selected to be inactivated in female cells (X-inactivation).

14.4. Evolution of human DNA sequence families and DNA organization

14.4.1. Gene duplication is a mechanism for generating functional divergence that has frequently been used in the evolution of mammalian genomes

In addition to large-scale gene duplication events involving the whole genome or large chromosomal segments (Section 14.2), selective duplication of specific genes can occur by small scale duplications involving copy transposition events and also tandem gene duplication.

Mechanisms resulting in small scale gene duplication

Duplicative (or copy) transposition involves a duplication of a DNA sequence prior to transposition. Small-scale DNA transposition in mammalian genomes most often occurs through an RNA intermediate and frequently results in a moderate to large interspersed repeat family. Processed copies of genes transcribed by RNA polymerase II normally lack functional regulatory sequences present in the original gene and, with few exceptions, degenerate into pseudogenes (processed pseudogenes; see Section 7.3.5). The exceptions all appear to be sequences which are copied from X-linked genes and which show testisspecific gene expression (e.g. the pyruvate dehydrogenase genes, PDHA2; see Section 14.3.3). Genes transcribed by RNA polymerase III, however, often contain an internal promoter sequence. After transposition the internal promoter can be used to transcribe a new copy which may be able to transpose in turn, leading to eventually very high copy numbers. This is the way in which the Alu repeat family appears to have evolved, using the reverse transcriptase of LINE1 elements to make cDNA copies (see Section 7.4.5 and Figure 14.25).

Figure 14.25. The human Alu repeat and the mouse B1 repeat evolved from processed copies of the 7SL RNA gene.

Figure 14.25

The human Alu repeat and the mouse B1 repeat evolved from processed copies of the 7SL RNA gene. Extensive homology of the Alu repeat sequences to the ends of the 7SL RNA sequence suggests that a polyadenylated copy of the 7SL RNA gene integrated elsewhere (more...)

Tandem gene duplication often occurs as a result of unequal crossover events or unequal sister chromatid exchanges. Numerous clustered human gene families show evidence of having acquired multiple members by this mechanism. In many cases, the duplicated genes degenerate into nonprocessed pseudogenes (Section 7.3.5). However, the transition between functioning duplicated gene and nonfunctional pseudogene may be a gradual one. This has given rise to the concept of the expressed pseudogene, a gene which is expressed at the mRNA level, or even at the polypeptide level, but which is nevertheless nonfunctional (Section 7.3.5). The absence of function means that selection pressure to conserve function will be relaxed and eventually mutations will accumulate, often leading to silencing of gene expression. Alternatively, the mutations may eventually result in the acquisition of different expression patterns and sometimes different functions (see Figure 14.2).

Acquisition of different expression patterns

Some diverged duplicated genes are known to be expressed predominantly in different environments. Sequence divergence in the different genes in the α-globin gene cluster and in the β-globin gene cluster may result in encoded products with slightly different biological properties. For example, the ε-, ζ- and γ-globin chains could possibly be especially suited to binding oxygen in the comparatively oxygen-poor environment of early development, whereas the α- and β-globin chains may be the preferred polypeptides in the environment of adult tissues.

Genes encoding different tissue-specific isoforms (alternative forms of the same protein) or isozymes (alternative forms of an enzyme) also appear to have evolved by gene duplication. For example, the enzyme alkaline phosphatase is encoded by at least four different genes which show tissue-specific differences in expression. Of these, three are clustered near the telomere of 2q: ALPI and ALPP encode alternative forms of the enzyme (87% protein sequence similarity) found in intestine and placenta, respectively, and ALPPL encodes a placental-like isozyme. A fourth member, ALPL, is located near the telomere of 1p and encodes an isozyme expressed in liver, bone, kidney and some other tissues, and is more distantly related to the intestinal and placental forms (57% and 52% sequence similarity, respectively). Note, however, that duplicated genes encoding subcellular-specific forms of the same protein are often located on different chromosomes, with gene duplication possibly arising from an ancestral genome duplication event. For example, in human liver there are two major isoforms of aldehyde dehydrogenase, a cytosolic and a mitochondrial form, which show 68% sequence identity over their 500 amino acid long sequences. The cytosolic and mitochondrial forms are, respectively, encoded by the ALDH1 gene on chromosome 9q and the ALDH2 gene on chromosome 12q. The two genes each have 13 exons and nine out of the 12 introns occur in homologous positions in the two coding sequences, strongly suggesting a common evolutionary origin by some kind of ancient gene duplication event (Strachan, 1992, pp. 32–33).

14.4.2. Concerted evolution occurs as a result of intragenomic (intraspecific) sequence exchanges within a DNA sequence family

In the case of certain gene and DNA sequence families, there may be a closer sequence relationship between individual family members in one species (paralogs) than that between orthologs in different species (concerted evolution). Thus, if we consider a specific gene family in two species, A and B, concerted evolution means that a family member in species A will be more closely related to other members of that family in species A than it will be to an ortholog or any other members of the same family in species B (Figure 14.15). Concerted evolution occurs because of various genetic mechanisms which cause sequence exchange between nonallelic DNA sequences within a genome. These mechanisms, which include unequal crossover, unequal sister chromatid exchange and gene conversion-like mechanisms (see Sections 9.3.2 and 9.3.3), are particularly prevalent in the case of tandemly repeated DNA sequences. For example, unequal crossover and unequal sister chromatid exchange can result in a specific repeat sequence spreading through an array of tandem repeats, and eventually replacing the other repeats, thereby resulting in sequence homogenization (see Figure 9.8). Because of meiotic recombination, the resulting effect can be transmitted to other genomes in a sexual population. As a result, concerted evolution may be observed between members of a DNA family within a species; sequence exchange between homologous sequences in the DNA from different animal species is essentially nonexistent.

Figure 14.15. Concerted evolution occurs in DNA sequence families when a relatively high level of sequence exchange occurs between family members.

Figure 14.15

Concerted evolution occurs in DNA sequence families when a relatively high level of sequence exchange occurs between family members. The family illustrated has two members α1 and α2 which arose by tandem duplication, and was inherited (more...)

The rDNA genes provide a useful example of sequence exchange between repeats within a cluster but, in addition and more unusually, sequence exchanges can occur between clusters. In the human genome the rDNA genes are organized as large clusters of tandem repeats containing 50–60 copies of an approximately 40-kb repeat unit (see Figure 8.3). The high degree of sequence homology between such large repeats facilitates frequent sequence exchanges between nonallelic repeats. Additionally, the clusters are located on the short arms of the acrocentric chromosomes 13, 14, 15, 21 and 22 which frequently exchange sequences by nonhomologous chromosome translocations. As a result, individual human rDNA genes are more similar to each other than they are to the rDNA genes of other primates.

14.4.3. Some gene families do not show strong evidence of concerted evolution: sequence homologies between orthologs may be greater than between different family members in one species

In general, members of a gene family or superfamily which are located in the same gene cluster show a higher degree of sequence homology than do members present on different clusters, and the degree of sequence homology is usually greatest between closely neighboring genes within a cluster. This is so for the following reasons:

  • Gene duplication events leading to formation of any one cluster are often examples of recent tandem duplications, whereas the duplications that have given rise to the different clusters are often comparatively ancient and may have resulted from ancestral genome duplication events.
  • The evolution of gene clusters by a series of tandem duplications will tend to mean that closely neighboring genes are more likely to have originated by a recent tandem duplication than more distantly spaced genes in the same cluster.
  • Following gene duplication there may be two competing forces which affect the sequence identity between the duplicated genes: sequence divergence (the sequences of the duplicated genes may be identical initially but during evolution will gradually become different as a result of independent accumulation of mutations in the two genes); and sequence homogenization (periodic sequence exchanges between the two genes will tend to result in sharing of sequences between them and therefore maintain sequence identity). Such homogenization results from genetic mechanisms (unequal crossover, unequal sister chromatid exchange and gene conversion) which are much more prevalent in tandemly duplicated genes than in distantly located or nonsyntenic duplicated genes. As a result, the sequences of distantly spaced genes or nonsyntenic genes will have a tendency to diverge more rapidly than those of tandemly duplicated genes.

The globin gene superfamily provides some useful examples. Sequence homology between the genes and gene products from different clusters (e.g. α-globin and β-globin) is much less than between genes and gene products from a single cluster (Figure 14.16 and Table 14.2). This is largely so because the different clusters are presumed to have originated early in evolution while gene duplications within clusters occurred comparatively recently. In the latter case, some duplication events are presumed to have occurred very recently, leading to duplicated genes which are almost identical. For example, the two human α-globin genes HBA1 and HBA2 encode identical products, and the products of the two γ-globin genes HBG1 and HBG2 differ by a single amino acid. In other cases, the duplicated genes within a cluster are clearly more diverged in sequence, presumably because the relevant duplication events occurred some time ago.

Figure 14.16. Evolution of the globin superfamily.

Figure 14.16

Evolution of the globin superfamily. Globins encoded by genes within a cluster show a greater degree of sequence homology than those encoded by genes on different clusters (as do the genes themselves). The close relationship between genes within a cluster (more...)

Table 14.2. Sequence variation in globin genes.

Table 14.2

Sequence variation in globin genes.

In contrast to tandemly repeated genes, intracluster sequence exchanges between globin genes are likely to be infrequent (except for the very recently duplicated genes, such as HBA1 and HBA2). This is so because the different globin genes are small (1.6 kb) and the chromosomal DNA separating them is not well conserved. Additionally, the stringent developmental regulation of gene expression within a cluster presumably imposes a functional constraint, minimizing sequence exchanges within different types of gene in a cluster. As a result, the sequence homology between orthologs in distant species such as mouse and humans may be greater than that between genes on the same human cluster. For example, the sequence homology between human and rabbit β-globin genes or even between human and mouse β-globin genes is greater than that between the human β-globin and ε-globin genes (Table 14.2). The β-ε globin gene split most likely occurred some time before human-mouse divergence and the lack of frequent sequence exchanges within the human β-globin cluster has resulted in the considerable genetic distance between human β-globin and human ε-globin being maintained.

Table 14.3. Comparison of genome organization and gene expression in humans and mice.

Table 14.3

Comparison of genome organization and gene expression in humans and mice.

14.5. Evolution of gene structure

Eukaryotic genes are often larger and more complex than those from simple organisms. The process of gene elongation during evolution appears to have frequently involved repetition of existing amino acid sequences, often as a result of exon duplication. Additionally, the structure of many eukaryotic polypeptides suggests that frequent exchange of structural or functional protein domains has occurred at the gene level by exon shuffling, resulting in complex mosaic genes capable of specifying a variety of different protein modules.

14.5.1. Complex genes can evolve by intragenic duplication, often as a result of exon duplication

In addition to the other forms of DNA duplication discussed earlier, human genes, like other eukaryotic genes, often show evidence of intragenic DNA duplication which can be substantial. For example, many genes are known to encode polypeptides whose sequences are completely or mostly composed of large repeats, with sequence homology between the repeats being very high in some cases (see Table 7.8). In many cases, a repeat corresponds to a protein domain, and in some cases individual repeats are encoded as a result of exon duplication. Building of larger polypeptides by repetition of a previously designed protein module offers a variety of evolutionary advantages:

  • Dosage repetition. The ubiquitin-encoding genes, UbB and UbC, encode polypeptides containing, respectively, three and nine tandem repeats of the sequence for ubiquitin, a small protein with several functions, notably in proteolysis. Most of the proteins that are degraded in the cytosol are hydrolyzed in large protein complexes known as proteasomes but, before the proteins can be delivered to the proteasomes, they need to be tagged by covalent binding to a series of ubiquitin molecules, forming a multiubiquitin chain. Because many proteins are short lived, large amounts of ubiquitin molecules need to be synthesized. These genes may therefore have evolved to express many copies of the ubiquitin sequence by gene elongation through intragenic repetition, as opposed to the tandem gene duplication mechanism used in the case of genes encoding rRNA or histones. The large polypeptide precursor is then cleaved to generate multiple copies of the desired ubiquitin monomer.
  • Structural extension. Repeating domains may be particularly advantageous in the case of proteins that have a major structural role. An illustrative example is provided by the 41 exons of the COL1A1 gene which encode the part of α1(I) collagen that forms a triple helix; each exon encodes essentially an integral number of copies (one to three) of an 18 amino acid motif which itself is composed of six tandem repeats of the structure Gly-X-Y where X and Y are variable amino acids.
  • Domain divergence. In most cases, intragenic duplication events have been followed by substantial nucleotide sequence divergence between the different repeat units. Such divergence presumably provides the opportunity of acquiring different, though related, functions. Sometimes the degree of sequence divergence between the repeats is such that the repeated structure may not be obvious at the sequence level. For example, in the case of the variable and constant domains of immunoglobulins, conservation of the secondary structure is much more apparent than that of the amino acid sequence (Figure 8.2). In some cases where the repeated structure is not obvious, statistical analysis can nevertheless reveal evidence for structural similarity.

14.5.2. Exon shuffling permits diverse combinations of structure and functional modules, and may be mediated by transposable elements

Many genes encode modules found in another type of gene. Fibronectin, a large extracellular matrix protein, contains multiple repeated domains encoded by individual exons or pairs of exons and is a good example of classical exon duplication (Figure 14.17). One of the repeated domains was subsequently found in tissue plasminogen activator. Like fibronectin, tissue plasminogen activator also contains other domains. They include a structural module characteristic of the epidermal growth factor precursor, and so-called kringle modules which have been found in other polypeptides such as prourokinase and plasminogen, etc. (Figure 14.17). Such observations have suggested the possibility of mechanisms permitting exon shuffling between genes (Patthy, 1994).

Figure 14.17. Exon duplication and exon shuffling.

Figure 14.17

Exon duplication and exon shuffling. The fibronectin gene contains 12 copies of an exon encoding a finger module, which is also found in the products of other genes such as the tissue plasminogen activator (TPA) gene. In addition, it contains 15 copies (more...)

Intragenic exon duplication can be explained by a variety of mechanisms including unequal crossover, or unequal sister chromatid exchanges (see Figure 9.7). In order to avoid frameshifts, one would expect selective amplification of exons with a total number of nucleotides exactly divisible by three (i.e. exons which are flanked by introns of the same phase, such as 0,0, 1,1 and 2,2 exons; see Box 14.3). This is what is observed for exons that are duplicated within a gene and also for exons encoding modules shared by different genes (Figure 14.17).

Box Icon

Box 14.3

Intron groups and intron phases. Introns are heterogeneous entities with different functional capacities and notable structural differences. Depending on the extent to which they rely on extrinsic factors to engage in RNA splicing and on the nature of (more...)

How do genes which are not necessarily closely related come to share sequences encoding very similar protein modules? One attractive possibility is retrotransposon-mediated exon shuffling. The most abundant retrotransposons in the human genome are the LINE1 (L1) elements which belong to the non-LTR class of retrotransposons (Section 7.4.6). Moran et al. (1999) developed an efficient L1 retrotransposition assay in cultured human cells and showed that L1 can insert into the intron of a gene and thence can make a copy of a downstream exon which can be inserted into another gene following another round of retrotransposition. This is possible because the L1 retrotransposition machinery has a weak specificity for its own 3′ end (and can act on other sequences including Alu sequences and processed pseudogenes which both lack reverse transcriptase). Because the L1 element's own poly (A) signal is weak, transcription of a L1 repeat within a gene often bypasses its own poly (A) sequence and uses instead a downstream poly (A) signal from the host gene. In so-doing it can make a copy of a host exon which can be stitched into another gene after another retrotransposition event (Figure 14.18).

Figure 14.18. Exon shuffling between genes can be mediated by transposable elements.

Figure 14.18

Exon shuffling between genes can be mediated by transposable elements. The LINE1 (L1) sequence family contains members that actively transpose in the human genome. L1 elements have weak poly(A) signals and so transcription can continue past such a signal (more...)

14.5.3. The origin of spliceosomal introns is controversial but their phylogenetic distribution suggests that many introns have been inserted into genes comparatively recently in evolution

Following the discovery of split genes in 1977, the significance of spliceosomal introns has been intensely debated. The introns found in mammalian genes are large compared with those in other species and the intron sequences are not well conserved. Nevertheless, it is becoming apparent that many introns contain functionally important sequences involved in gene regulation and the sequences of some short introns have been considerably conserved in evolution (see Table 14.2 for an example). Any proposed function for introns in human genes cannot be a general one, however, because of the small minority which lack introns (see Table 7.6).

The evolution of spliceosomal introns and their relationship to exons have also been the subject of much controversy (Logsdon, 1998). Essentially there are two alternative positions:

  • The ‘introns-early’ view. Different versions of the introns-early view exist but the exon theory of genes has been the most influential. It considers that exons are the descendants of ancient minigenes and spliceosomal introns are the descendants of self-splicing spacers which were located between the minigenes and were present in primordial cells. Exons have been considered as units which encode structural or functional domains, permitting evolution of larger genes by exon shuffling, a strategy that was favored particularly in eukaryotes. By contrast, introns are imagined to have been effectively lost from archaea and eubacteria.
  • The ‘introns-late’ view. This idea does not deny that exon shuffling between genes occurs but holds that split genes have arisen as a result of comparatively recent insertion of introns into genes. In this case, spliceosomal introns are thought to have descended from group II introns (see Box 14.3). The latter type of intron can function as a mobile element and are envisaged to have been introduced when a prokaryotic cell was endocytosed by a precursor to eukaryotic cells.

An important component of the exon theory of genes was the idea that exons in polypeptide-encoding genes represented functional or structural units. Exons consisting only of untranslated sequences (e.g. the first exon of the insulin gene; see Figure 14.19) cannot be accommodated in this view. Even in the case of coding exons, however, the evidence has been meager and in four major examples cited as evidence for the exon theory of genes, objective methods for detecting correspondence between exons and units of protein structure failed to identify any such correspondence (Stoltzfus et al., 1994). Another line of evidence used to support the introns-early view has been the claim that only phase 0 introns are correlated with the structure of ancient proteins (de Souza et al., 1998), but the phase correlations could instead reflect insertional bias.

Figure 14.19. Introns within polypeptide-encoding DNA can be grouped into three phases, according to the point of insertion.

Figure 14.19

Introns within polypeptide-encoding DNA can be grouped into three phases, according to the point of insertion. Phase 0 introns do not interrupt codons unlike phase 1 and phase 2 introns. A phase 1 intron in the human insulin gene INS interrupts a codon (more...)

The exon theory of genes was also supported by the apparent conservation of the positions of introns in genes known to have duplicated early in evolution, such as the globin genes (Figure 14.20). Against this view, the intron locations in numerous gene families (e.g. actin, myosin and tubulin families) are not conserved, suggesting instead that introns have been inserted recently, and in general phylogenetic studies have very strongly supported an introns-late model (Logsdon, 1998). One major problem with the introns-early model is that the requirement for introns to have been present since primordial times does not fit well with the very large number of positions where introns occur within a gene when different species are compared. It would mean not only that many original introns must have been lost subsequently from genes, but also that some genes must originally have had such a large number of introns that the corresponding exon sizes must have been tiny.

Figure 14.20. Members of the globin superfamily contain two introns which show reasonable conservation of positions, but not of size.

Figure 14.20

Members of the globin superfamily contain two introns which show reasonable conservation of positions, but not of size. Boxes represent the mature polypeptides. Numbers contained within the boxes are the amino acid positions. Gene sizes are indicated to (more...)

14.6. What makes us human? Comparative mammalian genome organization and the evolution of modern humans

The virtual universality of the genetic code, the high degree of conservation of key biochemical reactions, the huge evolutionary conservation of key developmental processes - these are features which emphasize the close relationship of humans to species that are morphologically quite distinct and evolutionarily distantly related. So what is it that makes us different? While many of the fundamental features of human cells, genome organization and gene expression are common to all eukaryotes, mammalian-specific features can be identified, such as genomic imprinting and X inactivation. In addition, certain other components of the genome or aspects of its expression show still higher levels of specificity.

14.6.1. What makes us different from mice?

Increasingly we rely on extrapolation from mouse studies to infer the situation in humans. For example, our knowledge of gene expression patterns in early human development is minimal because of the lack of early stage embryos for study; instead we study readily available mouse embryos. Because of the power of transgenic technology and gene targeting in mouse embryonic stem cells, the mouse is the most commonly used animal model of human disease. The extrapolation from mouse studies to humans has been justified by the general assumption that genomic DNA organization and gene expression patterns of mice and humans have been highly conserved, despite the approximately 110 million years since the two lineages diverged from a common ancestor (Kumar and Hedges, 1998). Increasingly, however, there is greater appreciation of differences between the two species.

General aspects of genome organization

The genome sizes are comparable (3000 Mb of DNA), and both genomes can be divided into isochores (large chromosomal regions in which the base composition of the DNA is comparatively homogeneous but which is variable between isochores). The human isochore classes include two light (AT rich) classes L1 and L2, and three heavy (GC rich) classes H1, H2 and H3, but the mouse genome is comparatively lacking in the H3 isochore (Sabeur et al., 1993). Cytogenetic analyses appear to reveal very different chromosome organizations: the mouse has 20 pairs of acrocentric chromosomes whereas there are 23 pairs of human chromosomes, most of which are metacentric or submetacentric. Nevertheless, comparison of high resolution mouse and human chromosome maps has indicated that orthologous chromosomal segments (e.g. those containing the major histocompatibility complex of mouse and humans) are located in regions where there is considerable similarity of cytogenetic banding patterns, albeit over relatively small chromosomal regions (Sawyer and Hozier, 1986).

Gene number

There appears to have been an erosion of CpG islands from the mouse genome: approximately 45 000 CpG islands are found in the human genome, but only 37 000 in the equivalent sized mouse genome (Antequara and Bird, 1993). This does not simply reflect a proportional reduction in gene number in the mouse genome because analysis of the sequence databases suggests that about 56% of human genes but only about 47% of mouse genes have CpG islands. On this basis, therefore, the total number of genes in humans and mice would appear to be much the same (about 80 000 when calculated from CpG island data). However, gene families often show different numbers of genes in different mammals (Figure 14.21). Such differences are expected to reflect complex processes of gene duplication and loss, with clear evidence of interlocus sequence exchanges (Figure 14.22).

Figure 14.21. The organizations of orthologous gene families can show considerable differences in different mammals.

Figure 14.21

The organizations of orthologous gene families can show considerable differences in different mammals. Shading of genes indicates proposed orthologous relationships, so that, for example, the horse ψα gene is orthologous to the θ (more...)

Figure 14.22. The evolution of the mammalian β-globin gene cluster has involved frequent gene duplications, conversions and gene loss or inactivation.

Figure 14.22

The evolution of the mammalian β-globin gene cluster has involved frequent gene duplications, conversions and gene loss or inactivation. Note that, in addition to gene duplication and gene loss events, there are frequent examples where the sequence of (more...)

In some cases, human genes do not appear to have any rodent orthologs, or if they exist there has been so much sequence divergence that they cannot be identified by standard hybridization methods even at low stringency. Examples include several of the genes mapping at or close to the major pseudoautosomal region such as SHOX, a locus for Leri-Weill syndrome, ANT3, MIC2 and KAL, the Kallman syndrome gene. Similarly, there appear to be four human apolipoprotein (a) genes but none can be detected in rodent genomes (Lawn, 1996). The recent origin of the apolipoprotein (a) genes most likely occurred following a duplication of the related plasminogen locus. See also Ottolenghi and Vekemans (1998).

Gene distribution

Gene order has not been generally well conserved between human and mouse chromosomes. As in most mammals, human-;mouse comparisons show a generally strong conservation of genes on the X chromosome. However, a few genes on the human X chromosome are known to have autosomal orthologs in mouse (see Figure 14.11). In addition, the general order of genes on the human and mouse X chromosomes is rather different, although conserved over subchromosomal regions (see Figure 14.10). For any one human or mouse autosome, orthologous regions are found on a variety of different chromosomes in the other species (Figure 14.23). However, again there is conservation of gene order over small to moderate sized subchromosomal regions. Such partial conservation of synteny (i.e. a group of linked genes in one species is paralleled by a linkage group between the orthologous genes in the other species) has proven to be very useful in identifying some human disease genes (see Section 15.4.3).

Figure 14.23. Conservation of orthologous human and mouse linkage groups is limited to subchromosomal regions.

Figure 14.23

Conservation of orthologous human and mouse linkage groups is limited to subchromosomal regions. Note that genes on human chromosome 10 (HSA10) have orthologs on at least seven different mouse chromosomes. Human chromosome 21 (HSA21) shows considerable (more...)

Gene organization and gene expression

The sizes of coding DNA in mouse and human genes are nearly identical, with an average size of perhaps about 550 codons (Makalowski and Boguski, 1998 and unpublished data). The respective polypeptide sequences show a high degree of sequence similarity, often within the 80–95% range. However, different classes of polypeptides may be extremely conserved (such as many gene products that are important in development or in crucially important cellular functions such as ribosomal function) while others, notably ligands and receptors that are important in host defense, can be much more divergent (Figure 14.24). The sequence similarity of coding DNA is generally a few per cent less than that for the polypeptide products (largely because of silent nucleotide substitutions, notably at the third base position of codons).

Figure 14.24. Amino acid sequence divergence between human and rodent orthologs.

Figure 14.24

Amino acid sequence divergence between human and rodent orthologs. The average human-rodent sequence divergence is shown for 14 protein families, representing a total of 603 proteins compared between human and mouse/rat. Note that certain types of protein, (more...)

Species differences in gene expression include differences in RNA processing and the alternative usage of promoters (see also Ottolenghi and Vekemans, 1998). For example, the human aldolase A gene has an additional promoter which does not function in the rat ortholog and similar parallels are expected for some human and mouse orthologs. Other human-mouse differences include considerable differences in the pattern of X chromosome inactivation (see Figure 14.12) and also in the conservation of imprinting. For example, the mouse insulin-like growth factor receptor gene Igfr is imprinted and paternal alleles are not expressed, but a different pattern of polymorphic imprinting is found in humans.

Noncoding DNA

Introns and noncoding DNA flanking genes are generally so highly diverged that alignment of orthologous sequences from the two species can be extremely difficult unless the comparison is confined to sequences which are located close to exons. Thus, very short introns (less than 200 bp) can be aligned and compared (see Table 14.2) but larger introns (accounting for the great majority of introns) are progressively more difficult to compare because of the very high sequence divergence. However, a striking example of the conservation of noncoding DNA occurs in the TCR cluster, where sequencing of about 100 kb in mouse and humans reveals a sequence identity of approximately 70%, even though only about 6% of the DNA is coding DNA (Koop and Hood, 1994). This is likely to be related to the very unusual mechanisms for expressing TCR and immunoglobulin genes (Section 8.6).

Telomeric minisatellite DNA is conserved between mice and humans, and indeed the TTAGGG repeats are conserved throughout vertebrates, presumably because of selection pressure to ensure continued recognition by the telomerase enzyme (Figure 2.9). However, highly repetitive DNA sequences in general are among the most rapidly diverging sequences because of the virtual absence of conservative selection pressure. For example satellite DNA sequences in the human and mouse genomes are quite different, and there is poor conservation of hypervariable minisatellites and microsatellites at orthologous locations in humans and mice. Nevertheless, there are several examples of apparent conservation of intragenic microsatellites located within orthologs in human and mouse or rat (Stallings, 1995).

In general highly repeated interspersed elements are not well-conserved. The Alu repeat appears to have evolved as a processed pseudogene from transcripts of the 7SL RNA gene (Ullu and Tschudi, 1984; see Figure 14.25) and appears to be specific to primates. It does, however, have a type of counterpart in the mouse genome, the B1 repeat, which also appears to have been generated from a 7SL RNA-like gene. The huge amplification in copy number apppears largely to have been generated some time ago in both species and the sequence divergence between the consensus sequences for the two types of repeat family is sufficiently high that probes can be made from them which permit distinction between the two genomes. The LINE-1 repeats are, however, conserved in humans and mouse (and throughout mammals), largely because of conservative selection pressure to maintain the sequence of the large ORF2 sequence which specifies the reverse transcriptase.

14.6.2. What makes us different from the great apes?

Traditional primate classifications places humans as the sole living members of the family Hominidae and the African great apes (gorillas, chimpanzees and bonobos i.e. pygmy chimpanzees) are placed together with Asian great apes (orangutans) within the subfamily Ponginae of the family Pongidae. However, this anthropocentric view has been strongly challenged by overwhelming evidence that the African great apes share their more recent common ancestry with humans than with orangutans (see Goodman, 1999). Nucleotide sequence data indicate that divergence of human-chimpanzee, human-gorilla and human-orangutan lineages occurred about 5.5, 6.7 and 8.2 million years ago respectively (Kumar and Hedges, 1998), and humans are now thought to be particularly closely related to chimpanzees and bonobos (Goodman et al., 1998). The divergence into separate species may initially have been driven by small cytogenetic differences and/or mutations in key genes regulating gamete formation or regulation of early embryonic development. However, once speciation had been accomplished, the effective reproductive isolation meant that speciesspecific patterns of intragenomic sequence exchange could result in extending differences between species.

Genome organization and coding DNA

Cytogenetic comparisons of the great apes emphasize the very strong conservation of banding patterns (Yunis and Prakash, 1982). The only major structural differences are a number of pericentric and paracentric inversions, the recent fusion of two chromosomes to form human chromosome 2, and a reciprocal translocation between the gorilla chromosomes which correspond to human chromosomes 5 and 17. In addition, the extent of heterochromatinization is variable, with most gorilla chromosome arms and about half of the chimpanzee chromosome arms containing terminal heterochromatic G bands which are absent from human and orangutan chromosomes (see Figure 14.26). Although the present information on comparative gene mapping in primates is sketchy, the available details show evidence of strong conservation of synteny (i.e. linked genes in humans are almost always linked in the great apes). However, large-scale organization at certain loci can differ. For example, a large part of the human Ig κ locus on 2p is duplicated (see Figure 8.28) but this is not the case in the corresponding chimpanzee and gorilla chromosomes (Ermert et al., 1995).

When orthologous human and chimpanzee sequences are compared, the coding DNA typically shows 98–100% sequence identity (see Table 14.2 for an example). Indeed, in some cases, specific alleles of certain human genes are more closely related to orthologs in chimpanzees than they are to other human alleles. For example, at the human HLA-DRβ locus, the alleles HLA-DRB1*0302 and HLA-DRB1*0701 are clearly closer in sequence to certain alleles of the orthologous chimpanzee (Pan troglodytes) gene Patr-DRB than they are to each other (Figure 14.27). Such observations are consistent with a comparatively ancient origin for such divergent alleles, predating man-chimpanzee divergence. Although there are as yet limited data, extrapolating from known human-mouse differences suggests that extremely few human genes could be expected to lack counterparts in the chimp and gorilla genomes and vice versa. In just about all cases, one would expect that these human-specific genes would have arisen by very recent gene duplication events so that both gene copies may be identical, and as such be rather unlikely to provide much input into reinforcing the observed anatomical and developmental differences between humans and the great apes. Nevertheless human-primate differences do exist and the molecular basis for these differences are now beginning to be discovered (Gibbons, 1998).

Figure 14.27. Some human alleles show greater sequence divergence than when individually compared with orthologous chimpanzee genes.

Figure 14.27

Some human alleles show greater sequence divergence than when individually compared with orthologous chimpanzee genes. From a total of 270 amino acid positions, the HLA-DRB*10302 and HLA-DRB1*0701 alleles show a total of 31 differences (13%). Comparison (more...)

Noncoding DNA

Noncoding DNA from humans and apes can show extremely high levels of sequence homology. For example, pairwise comparisons of orthologous noncoding sequences spanning more than 12.5 kb of the β-globin gene cluster showed sequence divergence of only 1.7, 1.8 and 3.3% in the case of human-chimp, human-gorilla and human-orangutan comparisons, respectively (Goodman et al., 1998). However, highly repeated DNA families appear to be undergoing a more rapid evolution. Although a common alphoid sequence is conserved in all human and great ape chromosomes (Baldini et al., 1993), the vast majority of human chromosome-specific alphoid sequences do not hybridize to the centromeres of the corresponding chimpanzee and gorilla chromosomes (Archidiacono et al., 1995). In addition, a subterminal satellite DNA located adjacent to the telomeres of chimpanzee and gorilla chromosomes has no counterpart in human and orangutan chromosomes (Royle et al., 1994). This satellite most likely is the major component of the additional heterochromatic terminal G bands of chimpanzee and gorilla chromosomes (see Figure 14.26). Minisatellite and microsatellite sequences can also differ between humans and primates. Telomere sequences are conserved but hypervariable minisatellite sequences show transient evolution in the primate genomes - highly polymorphic human minisatellites often have monomorphic or minimal variability in the corresponding chromosomes of the great apes (Gray and Jeffreys, 1991). Microsatellites also show differences at orthologous positions in humans and other primates (Rubinsztein et al., 1995).

Table 14.4. Comparison of genome organization and gene expression in humans and the great apes.

Table 14.4

Comparison of genome organization and gene expression in humans and the great apes.

Highly repetitive interspersed DNA can also show differences. Although the Alu repeat family is found in other primates, several different subfamilies have been recognized (Jurka and Miloslajevic, 1991, for a classification) and appear to have spread at different periods of primate evolution. The average age of the oldest subfamily, the Alu J repeats, was estimated at about 55 million years. This family, like other old subfamilies, is characterized by considerable divergence beween the members but comparatively close resemblance of the consensus to the 7SL RNA sequence. A small number of the Alu sequences belong to families which are extremely recent in evolutionary origin and contain members that are actively transposing. They include the Sb1 (previously alternatively known as the PV or HS subfamily) and Sb2 families which appear, on the basis of copy number, to be very largely human-specific (Zietkiewicz et al., 1994).

14.6.3. DNA-based studies indicate that the genetic diversity in humans is very limited and that we are descended from individuals who lived in east Africa about 200 000 years ago

The limited genetic diversity in humans

Humans are unusual among primates in that we show much more limited genetic variability than our close relatives, the chimpanzees, and other apes. For example, the sequence of a 729 bp intron in the ZFY gene on the Y chromosome revealed no differences when the Y chromosomes of 38 different men were sampled, although there were many differences when referenced against the equivalent sequences from chimpanzees, gorillas and orangutans (Dorit et al., 1995). Other studies on genes located in diverse genomic regions all confirm the low nucleotide diversity and therefore limited genetic variability of humans (Li and Sadler, 1991). By contrast, the genetic diversity in individual species of the great apes is very much higher. Chimpanzees show substantially more genetic variation in their nuclear genomes than humans do, and several chimpanzee and bonobo clades (and even single social groups) have retained substantially more mitochondrial variation than is seen in the entire human species (Gagneux et al., 1999). These findings strongly suggest that at a recent time in our evolutionary past, the human population went through a genetic ‘bottleneck’ (i.e. a severe reduction in effective population size), so that a large part of the previously existing genetic variability was lost.

The great majority of the existing overall genetic variation in humans is represented by individual diversity within populations. By contrast, differences between racial groups accounts for only about 10% of the total variation (Barbujani et al., 1997). Differences do occur, however, in the extent of genetic diversity within different populations, with African populations demonstrating the greatest diversity. These data are consistent with a recent genetic bottleneck followed by rapid population expansion from African populations (see also below).

DNA analyses on fossil remains

Research into our recent evolutionary past has traditionally been bedevilled by the incompleteness of the fossil record although exciting finds continue to be made about our recent ancestors (Culotta, 1999). DNA-based studies have provided a powerful alternative way of analysing fossils. Because the DNA from such ancient remains is present in small amounts and has been considerably degraded, PCR is used to amplify small overlapping portions of the DNA which can then be sequenced. Unfortunately, there is an age limit to this approach because of the chemical instability of DNA; previous hopes of large scale ancient DNA studies have been tempered by the realization that most samples more than 100 000 years old fail to yield amplifiable DNA.

Despite the above limitations, some important findings can be made. Krings et al. (1997) reported the successful amplification and sequencing of a portion of the hypervariable segment of the human mtDNA control region from a Neanderthal fossil expected to be from 30 000–100 000 years old. Neanderthals were a population of archaic humans who inhabited Europe and Western Asia between about 230 000 and 30 000 years ago and for part of this time they coexisted with modern humans but their relationship to modern humans was unclear. The careful study of Krings et al. (1997) showed that the neanderthal mtDNA sequence was clearly different from that of modern humans (about three times the average difference found between different humans but about half the average difference between humans and chimpanzees.). They were able to conclude that neanderthals went extinct without contributing any mtDNA to modern humans, and that the lineages leading to neanderthals and modern humans diverged about 550 000–690 000 years ago.

Reconstructing the history of human populations

A variety of DNA-based studies have been carried out on extant human and primate populations in order to infer our evolutionary past. DNA sequences or markers from selected loci are studied in usually large numbers of individuals and phylogenies are compiled based on the degree of relatedness of the individual samples (see von Haeseler et al., 1995 and Jorde et al., 1998 for the types of approach that are used). In many cases sequences or markers from mtDNA or from the nonrecombining portion of the Y chromosome have been used, or noncoding sequences in nuclear DNA. The absence of recombination in mtDNA/Y sequences makes interpretation of the data easier, and coalescence approaches are more easily applied to estimate the date of the common ancestor of the individuals sampled (Figure 14.28). Using this type of approach both mtDNA and Y chromosome DNA studies indicate that the common ancestor of modern humans can be traced back to somewhat less than about 200 000 years ago, This does not mean of course that only a single person (e.g. ‘the mitochondrial Eve’) was present at that time; instead, it simply means that the DNA of the other people living at that time didn't get transmitted to the present human population (Figure 14.28).

Figure 14.28. Coalescence analyses seek to trace back lineages until they coalesce in a single individual.

Figure 14.28

Coalescence analyses seek to trace back lineages until they coalesce in a single individual. The example shows uniparental inheritance which can be tracked using mt DNA markers (inheritance through the maternal line) or by using markers from the nonrecombining portion (more...)

As mtDNA and the nonrecombining Y sequences are inherited through the maternal line or paternal line respectively, the above analyses also give insights into the possibility of differential migration of the two sexes. Perhaps surprisingly, these type of studies have shown that female genes have geographically migrated more than male genes. One possible explanation is that men may typically have traveled greater distances in their lifetimes, but when it came time to settle down and have children they often went home to their birthplaces to be joined by spouses who may have had to migrate some distance from their birthplace.

Where have we come from: the African replacement (‘Out of Africa’) model versus the multiregional hypothesis

DNA-based studies have proved very valuable in studying recent human history (see Cavalli-Sforza et al., Further Reading) but there has been a long-standing controversy regarding the geographical origins of modern humans. There is no doubt that African populations have the largest amount of genetic diversity but two competing hypotheses attempt to explain our origins:

  • The multiregional hypothesis (Figure 14.29A). This states that modern Homo sapiens evolved from more archaic forms over the course of a million years or so, at several different locations in the Old World. The high degree of genetic homogeneity was maintained by natural selection and by gene flow between different populations.
  • The African replacement hypothesis (Figure 14.29B). This states that modern humans arose in Africa approximately 100 000–200 000 years ago and dispersed throughout the Old World to replace archaic human species completely.

Figure 14.29. Competing hypotheses to explain the origins of modern humans.

Figure 14.29

Competing hypotheses to explain the origins of modern humans. (A) The multiregional hypothesis. Modern Homo sapiens is imagined to have evolved from more archaic forms over the course of a million years or so, at several different locations in the Old (more...)

The debate regarding these two competing hypotheses remains to be resolved. Evolutionary theory predicts that an older, ‘source’ population will typically have greater diversity than a population derived more recently from it and the very strong evidence for greater genetic diversity of African populations (Jorde et al., 1998) is consistent with the African replacement hypothesis. However, there remains considerable uncertainty regarding various parameters such as gene flow patterns, and population sizes. Perhaps we will have a more definitive answer by the year 2030, say, when the entire genomes of almost everyone on the planet may have been sequenced using DNA chips!

Further reading

  1. Cavalli-Sforza LL, Menozzi P, Piazza A (1994) The History and Geography of Human Genes. Princeton University Press, Princeton, NJ.
  2. Doolittle W F. Phylogenetic classification and the universal tree. Science. (1999);284:2124–2128. [PubMed: 10381871]
  3. Jackson M, Strachan T, Dover GA (1996) Human Genome Evolution. BIOS Scientific Publishers, Oxford, UK.
  4. Jones S, Martin R, Pilbeam D (1992) The Cambridge Encyclopaedia of Human Evolution. Cambridge University Press, Cambridge.
  5. Li WH, Graur D (1991) Fundamentals of Molecular Evolution. Sinauer Associates, Sunderland, MA.
  6. Nei M (1987) Molecular Evolutionary Genetics. Columbia University Press, New York.
  7. The NCBI Taxonomy database at


  1. Antequara F, Bird A. Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA. (1993);90:11995–11999. [PMC free article: PMC48112] [PubMed: 7505451]
  2. Archidiacono N, Antonacci R, Marzella R, Finelli P, Lonoce A, Rocchi M. Comparative mapping of human alphoid sequences in great apes using fluorescence in situ hybridization. Genomics. (1995);25:477–484. [PubMed: 7789981]
  3. Baldini A, Ried T, Shridhar V, Ogura K, D'Aiuto L, Rocchi M, Ward D C. An alphoid DNA sequence conserved in all human and great ape chromosomes: evidence for ancient centromeric sequences at human chromosomal regions 2q21 and 9q13. Hum. Genet. (1993);90:577–583. [PubMed: 8444464]
  4. Barbujani G, Magagni A, Minch E, Cavalli-Sforza L L. An apportionment of human DNA diversity. Proc. Natl Acad. Sci. USA. (1997);94:4516–4519. [PMC free article: PMC20754] [PubMed: 9114021]
  5. Belfort M. An expanding universe of introns. Science. (1993);262:1009–1010. [PubMed: 7694364]
  6. Blair H J, Reed V, Laval S H, Boyd Y. New insights into the man-mouse comparative map of the X chromosome. Genomics. (1994);15:215–220. [PubMed: 8188251]
  7. Cavalier-Smith T. Intron phylogeny: a new hypothesis. Trends Genet. (1991);7:145–148. [PubMed: 2068786]
  8. Culotta E. A new human ancestor? Science. (1999);284:572–573. [PubMed: 10328732]
  9. de Souza S J, Long M, Klein R J, Roy S, Lin S, Gilbert W. Towards a resolution of the introns early/late debate: only phase zero introns are correlated with the structure of ancient proteins. Proc. Natl Acad. Sci. USA. (1998);95:5094–5098. [PMC free article: PMC20219] [PubMed: 9560234]
  10. Disteche C M. Escape from X-inactivation in human and mouse. Trends Genet. (1995);11:17–22. [PubMed: 7900190]
  11. Doolittle W F. You are what you eat: a gene transfer rachet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet. (1998);14:307–311. [PubMed: 9724962]
  12. Dorit R L, Akashi H, Gilbert W. Absence of polymorphism at the ZFY locus on the human Y chromosome. Science. (1995);268:1183–1185. [PubMed: 7761836]
  13. Eichler E E. Masquerading repeats: paralogous pitfalls of the human genome. Genome Res. (1998);8:758–762. [PubMed: 9724321]
  14. Ellis N. The war of the sex chromosomes. Nature Genet. (1998);20:9–10. [PubMed: 9731521]
  15. Ermert K, Mitlohner H, Schempp W, Zachau H G. The immunoglobulin kappa locus of primates. Genomics. (1995);25:623–629. [PubMed: 7759095]
  16. Gagneux P, Wills C, Gerloff U, Tautz D, Morin P A, Boesch C, Fruth B, Hohmann G, Ryder O A, Woodruff D S. Mitochondrial sequences show diverse evolutionary histories of African hominoids. Proc. Natl Acad. Sci. USA. (1999);96:5077–5082. [PMC free article: PMC21819] [PubMed: 10220421]
  17. Garcia-Fernandez J, Holland P W. Archetypal organization of the amphioxus Hox gene cluster. Nature. (1994);370:563–566. [PubMed: 7914353]
  18. Gibbons A. The mystery of humanity's missing mutations. Science. (1995);267:35–36. [PubMed: 7809607]
  19. Gibbons A. Which of our genes make us human? Science. (1998);281:1432–1434. [PubMed: 9750111]
  20. Goodman M. The genomic record of humankind's evolutionary roots. Am. J. Hum. Genet. (1999);64:31–39. [PMC free article: PMC1377699] [PubMed: 9915940]
  21. Goodman M, Porter C A, Czelusniak J. Towards a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Mol. Phylogenet. Evol. (1998);9:585–598. [PubMed: 9668008]
  22. Graves J A M, Wakefield M J, Toder R. The origin and evolution of the pseudoautosomal regions of human sex chromosomes. Hum. Mol. Genet. (1998);7:1991–1996. [PubMed: 9817914]
  23. Gray I C, Jeffreys A J. Evolutionary transcience of hypervariable minisatellites in man and the primates. Proc. R. Soc. Lond. B Biol. Sci. (1991);243:241–253. [PubMed: 1675801]
  24. Gray M W, Burger G, Lang B F. Mitochondrial evolution. Science. (1999);283:1476–1481. [PubMed: 10066161]
  25. Hardison R, Miller W. Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters. Mol. Biol. Evol. (1993);10:73–102. [PubMed: 8383794]
  26. Jegalian K, Page D C. A proposed pathway by which genes common to mammalian X and Y chromosomes evolve to become X inactivated. Nature. (1998);394:776–780. [PubMed: 9723615]
  27. Jorde L B, Bamshad M, Rogers A R. Using mitochondrial and nuclear DNA markers to reconstruct human evolution. BioEssays. (1998);20:126–136. [PubMed: 9631658]
  28. Jurka J, Miloslajevic A. Reconstruction and analysis of human Alu genes. J. Mol. Evol. (1991);32:105–121. [PubMed: 1706781]
  29. Klein J, Takahata N, Ayala F J. MHC polymorphism and human origins. Scientific American. (1993);269:78–83. [PubMed: 8266061]
  30. Koop B F, Hood L. Striking sequence similarity over almost 100 kilobases of human and mouse T cell receptor DNA. Nature Genet. (1994);7:48–53. [PubMed: 8075639]
  31. Krings M, Stone A, Schmitz W, Krainitzki H, Stoneking M, Paabo S. Neandertal DNA sequences and the origins of modern humans. Cell. (1997);90:19–30. [PubMed: 9230299]
  32. Kumar S, Hedges S B. A molecular timescale for vertebrate evolution. Nature. (1998);392:917–920. [PubMed: 9582070]
  33. Lahn B T, Page D C. Functional coherence of the human Y chromosome. Science. (1997);278:675–680. [PubMed: 9381176]
  34. Lalioti M D, Scott H S, Buresi C. Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy. Nature. (1997);386:847–851. [PubMed: 9126745]
  35. Lawn R M. How often has Lp(a) evolved? Clin. Genet. (1996);49:176–184.
  36. Li W H, Sadler L A. Low nucleotide diversity in man. Genetics. (1991);129:513–523. [PMC free article: PMC1204640] [PubMed: 1743489]
  37. Logsdon J M Jr. The recent origins of spliceosomal introns revisited. Curr. Opin. Genet. Dev. (1998);8:637–648. [PubMed: 9914210]
  38. Lopez-Garcia P, Moreira D. Metabolic symbiosis at the origin of eukaryotes. Trends Biochem. Sci. (1999);24:88–93. [PubMed: 10203753]
  39. Lundin L G. Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in Man and the house mouse. Genomics. (1993);16:1–19. [PubMed: 8486346]
  40. Makalowski W, Boguski M S. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc. Natl Acad. Sci. USA. (1998);95:9407–9412. [PMC free article: PMC21351] [PubMed: 9689093]
  41. Martin W, Muller M. The hydrogen hypothesis for the first eukaryote. Nature. (1998);392:37–41. [PubMed: 9510246]
  42. Meyer A, Malaga-Trillo E. Vertebrate genomics: more fishy tales about Hox genes. Curr. Biol. (1999);9:R210–R213. [PubMed: 10209088]
  43. Miller A P, Willard H F. Chromosomal basis of X chromosome inactivation: identification of a multigene domain in Xp11.21-p11.22 that escapes X inactivation. Proc. Natl Acad. Sci. USA. (1998);95:8709–8714. [PMC free article: PMC21141] [PubMed: 9671743]
  44. Moran J V, DeBerardinis R J, Kazazian H H Jr. Exon shuffling by L1 retrotransposition. Science. (1999);283:1530–1534. [PubMed: 10066175]
  45. Murphy P M. Molecular mimicry and the generation of host defense protein diversity. Cell. (1993);72:823–826. [PubMed: 8458078]
  46. O'Brien S J, Womack J E, Lyons L A, Moore K J, Jenkins N A, Copeland N G. Anchored reference loci for comparative genome mapping in mammals. Nature Genet. (1993);3:103–112. [PubMed: 8499943]
  47. Ohno S. Ancient linkage groups and frozen accidents. Nature. (1973);244:259–262. [PubMed: 4200792]
  48. Ottolenghi C, Vekemans M. Genetic divergence between mouse and humans: a useful direction for gene pathway analysis. Teratology. (1998);58:82–87. [PubMed: 9802187]
  49. Pamilo P, O'Neill R J. Evolution of the Sry genes. Mol. Biol. Evol. (1997);14:49–55. [PubMed: 9000753]
  50. Patthy L. Exons and introns. Curr. Opin. Struct. Biol. (1994);4:383–392.
  51. Perry J, Feather S, Smith A, Palmer S, Ashworth A. The human FXY gene is located within Xp22 3: implications for evolution of the mammalian X chromosome. Hum. Mol. Genet. (1998);7:299–305. [PubMed: 9425238]
  52. Rappold G A. The pseudoautosomal regions of the human sex chromosomes. Hum. Genet. (1993);92:315–324. [PubMed: 8225310]
  53. Ried K, Rao E, Schiebel K, Rappold G A. Gene duplications as a recurrent theme in the evolution of the human pseudoautosomal region 1: isolation of the gene ASMTL. Hum. Molec. Genet. (1998);7:1771–1778. [PubMed: 9736779]
  54. Rivera M C, Jain R, Moore J E, Lake J A. Genomic evidence for two functionally distinct gene classes. Proc. Natl Acad. Sci. USA. (1998);95:6239–6244. [PMC free article: PMC27643] [PubMed: 9600949]
  55. Royle N J, Baird D M, Jeffreys A J. A subterminal satellite located adjacent to telomeres in chimpanzee is absent from the human genome. Nature Genet. (1994);6:52–56. [PubMed: 8136835]
  56. Rubinsztein D C, Amos W, Leggo J, Goodburn S, Jain S, Li S H, Margolis R L, Ross C A, Ferguson-Smith M A. Microsatellite evolution - evidence for directionality and variation in rate between species. Nature Genet. (1995);10:337–343. [PubMed: 7670473]
  57. Ruddle F H, Barlels J L, Bentley K L, Kappen C, Murtha M T, Pendleton J W. Evolution of Hox genes. Annu. Rev. Genet. (1994);28:423–442. [PubMed: 7893134]
  58. Sabeur G, Macaya G, Kadi F, Bernardi G. The isochore patterns of mammalian genomes and their phylogenic implications. J. Mol. Evol. (1993);37:93–108. [PubMed: 8411213]
  59. Sawyer J R, Hozier J C. High resolution of mouse chromosomes: banding conservation between Man and mouse. Science. (1986);232:1632–1635. [PubMed: 3715469]
  60. Skrabanek L, Wolfe K H. Eukaryote genome duplication - where's the evidence? Curr. Opin. Genet. Dev. (1998);8:694–700. [PubMed: 9914206]
  61. Stallings R L. Conservation and evolution of (CT)n/(GA)n microsatellite sequences at orthologous positions in diverse mammalian genomes. Genomics. (1995);25:107–113. [PubMed: 7774907]
  62. Stoltzfus A, Spenceer D F, Zuker M, Logsdon J M Jr, Doolittle W F. Testing the exon theory of genes - the evidence from protein structure. Science. (1994);265(5169):202–207. [PubMed: 8023140]
  63. Strachan T (1992) The Human Genome, pp. 32–33. Bios Scientific Publishers, Oxford.
  64. Tagle D A, Stanhope M J, Siemieniak D R, Benson P, Goodman M, Slightom J L. The β globin gene cluster of the prosimian primate Galago crassicaudatus: nucleotide sequence determination of the 41 kb cluster and comparative sequence analyses. Genomics. (1992);13:741–760. [PubMed: 1639402]
  65. Ullu E, Tschudi C. Alu sequences are processed 7SL RNA genes. Nature. (1984);312:171–172. [PubMed: 6209580]
  66. von Haeseler A, Sajantila A, Paabo S. The genetical archaeology of the human genome. Nature Genet. (1995);14:135–140. [PubMed: 8841181]
  67. Weller P A, Critcher R, Goodfellow P N, German J, Ellis N A. The human Y chromosome homolog of XG: transcription of a naturally truncated gene. Hum. Mol. Genet. (1995);4:859–868. [PubMed: 7633446]
  68. Yunis J J, Prakash O. The origin of man: a chromosomal pictorial legacy. Science. (1982);215:1525–1530. [PubMed: 7063861]
  69. Zietkiewicz E, Richer C, Makalowski W, Jurka J, Labuda D. A young Alu subfamily amplified independently in human and African great apes lineages. Nucleic Acids Res. (1994);22 (25):5608–5612. [PMC free article: PMC310123] [PubMed: 7838713]
Copyright © 1999, Garland Science.
Bookshelf ID: NBK7565