NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Alberts B, Johnson A, Lewis J, et al. Molecular Biology of the Cell. 4th edition. New York: Garland Science; 2002.

  • By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.
Cover of Molecular Biology of the Cell

Molecular Biology of the Cell. 4th edition.

Show details

The Diversity of Genomes and the Tree of Life

The success of living organisms based on DNA, RNA, and protein, out of the infinitude of other chemical forms that one might conceive of, has been spectacular. They have populated the oceans, covered the land, infiltrated the Earth's crust, and molded the surface of our planet. Our oxygen-rich atmosphere, the deposits of coal and oil, the layers of iron ores, the cliffs of chalk and limestone and marble—all these are products, directly or indirectly, of past biological activity on Earth.

Living things are not confined to the familiar temperate realm of land, water, and sunlight inhabited by plants and plant-eating animals. They can be found in the darkest depths of the ocean, in hot volcanic mud, in pools beneath the frozen surface of the Antarctic, and buried kilometers deep in the Earth's crust. The creatures that live in these extreme environments are unfamiliar, not only because they are inaccessible, but also because they are mostly microscopic. In more homely habitats, too, most organisms are too small for us to see without special equipment: they tend to go unnoticed, unless they cause a disease or rot the timbers of our houses. Yet microorganisms make up most of the total mass of living matter on our planet. Only recently, through new methods of molecular analysis and specifically through the analysis of DNA sequences, have we begun to get a picture of life on Earth that is not grossly distorted by our biased perspective as large animals living on dry land.

In this section we consider the diversity of organisms and the relationships among them. Because the genetic information for every organism is written in the universal language of DNA sequences, and the DNA sequence of any given organism can be obtained by standard biochemical techniques, it is now possible to characterize, catalogue, and compare any set of living organisms with reference to these sequences. From such comparisons we can estimate the place of each organism in the family tree of living species—the ‘tree of life’. But before describing what this approach reveals, we need first to consider the routes by which cells in different environments obtain the matter and energy they require to survive and proliferate, and the ways in which some classes of organisms depend on others for their basic chemical needs.

Cells Can Be Powered by a Variety of Free Energy Sources

Living organisms obtain their free energy in different ways. Some, such as animals, fungi, and the bacteria that live in the human gut, get it by feeding on other living things or the organic chemicals they produce; such organisms are called organotrophic (from the Greek word trophe, meaning “food”). Others derive their energy directly from the nonliving world. These fall into two classes: those that harvest the energy of sunlight, and those that capture their energy from energy-rich systems of inorganic chemicals in the environment (chemical systems that are far from chemical equilibrium). Organisms of the former class are called phototrophic (feeding on sunlight); those of the latter are called lithotrophic (feeding on rock). Organotrophic organisms could not exist without these primary energy converters, which constitute the largest mass of living matter on Earth.

Phototrophic organisms include many types of bacteria, as well as algae and plants, on which we—and virtually all the living things that we ordinarily see around us—depend. Phototrophic organisms have changed the whole chemistry of our environment: the oxygen in the Earth's atmosphere is a by-product of their biosynthetic activities.

Lithotrophic organisms are not such an obvious feature of our world, because they are microscopic and mostly live in habitats that humans do not frequent—deep in the ocean, buried in the Earth's crust, or in various other inhospitable environments. But they are a major part of the living world, and are especially important in any consideration of the history of life on Earth.

Some lithotrophs get energy from aerobic reactions, which use molecular oxygen from the environment; since atmospheric O2 is ultimately the product of living organisms, these aerobic lithotrophs are, in a sense, feeding on the products of past life. There are, however, other lithotrophs that live anaerobically, in places where little or no molecular oxygen is present, in circumstances similar to those that must have existed in the early days of life on Earth, before oxygen had accumulated.

The most dramatic of these sites are the hot hydrothermal vents found deep down on the floor of the Pacific and Atlantic Oceans, in regions where the ocean floor is spreading as new portions of the Earth's crust form by a gradual upwelling of material from the Earth's interior (Figure 1-15). Downward-percolating seawater is heated and driven back upward as a submarine geyser, carrying with it a current of chemicals from the hot rocks below. A typical cocktail might include H2S, H2, CO, Mn2+, Fe2+, Ni2+, CH2, NH4 +, and phosphorus-containing compounds. A dense population of bacteria lives in the neighborhood of the vent, thriving on this austere diet and harvesting free energy from reactions between the available chemicals. Other organisms—clams, mussels, and giant marine worms—in turn live off the bacteria at the vent, forming an entire ecosystem analogous to the system of plants and animals that we belong to, but powered by geochemical energy instead of light (Figure 1-16).

Figure 1-15. The geology of a hot hydrothermal vent in the ocean floor.

Figure 1-15

The geology of a hot hydrothermal vent in the ocean floor. . Water percolates down toward the hot molten rock upwelling from the Earth's interior and is heated and driven back upward, carrying minerals leached from the hot rock. A temperature gradient (more...)

Figure 1-16. Living organisms at a hot hydrothermal vent.

Figure 1-16

Living organisms at a hot hydrothermal vent. Close to the vent, at temperatures up to about 150°C, various lithotrophic species of bacteria and archaea (archaebacteria) live, directly fuelled by geochemical energy. A little further away, where (more...)

Some Cells Fix Nitrogen and Carbon Dioxide for Others

To make a living cell requires matter, as well as free energy. DNA, RNA, and protein are composed of just six elements: hydrogen, carbon, nitrogen, oxygen, sulfur, and phosphorus. These are all plentiful in the nonliving environment, in the Earth's rocks, water, and atmosphere, but not in chemical forms that allow easy incorporation into biological molecules. Atmospheric N2 and CO2, in particular, are extremely unreactive, and a large amount of free energy is required to drive the reactions that use these inorganic molecules to make the organic compounds needed for further biosynthesis—that is, to fix nitrogen and carbon dioxide, so as to make N and C available to living organisms. Many types of living cells lack the biochemical machinery to achieve this fixation, and rely on other classes of cells to do the job for them. We animals depend on plants for our supplies of organic carbon and nitrogen compounds. Plants in turn, although they can fix carbon dioxide from the atmosphere, lack the ability to fix atmospheric nitrogen, and they depend in part on nitrogen-fixing bacteria to supply their need for nitrogen compounds. Plants of the pea family, for example, harbor symbiotic nitrogen-fixing bacteria in nodules in their roots.

Living cells therefore differ widely in some of the most basic aspects of their biochemistry. Not surprisingly, cells with complementary needs and capabilities have developed close associations. Some of these associations, as we see below, have evolved to the point where the partners have lost their separate identities altogether: they have joined forces to form a single composite cell.

The Greatest Biochemical Diversity Is Seen Among Procaryotic Cells

From simple microscopy, it has long been clear that living organisms can be classified on the basis of cell structure into two groups: the eucaryotes and the procaryotes. Eucaryotes keep their DNA in a distinct membrane-bounded intracellular compartment called the nucleus. (The name is from the Greek, meaning “truly nucleated,” from the words eu, “well” or “truly,” and karyon, “kernel” or “nucleus”.) Procaryotes have no distinct nuclear compartment to house their DNA. Plants, fungi, and animals are eucaryotes; bacteria are procaryotes.

Most procaryotic cells are small and simple in outward appearance, and they live mostly as independent individuals, rather than as multicellular organisms. They are typically spherical or rod-shaped and measure a few micrometers in linear dimension (Figure 1-17). They often have a tough protective coat, called a cell wall, beneath which a plasma membrane encloses a single cytoplasmic compartment containing DNA, RNA, proteins, and the many small molecules needed for life. In the electron microscope, this cell interior appears as a matrix of varying texture without any discernible organized internal structure (Figure 1-18).

Figure 1-17. Shapes and sizes of some bacteria.

Figure 1-17

Shapes and sizes of some bacteria. Although most are small, as shown, there are also some giant species. An extreme example (not shown) is the cigar-shaped bacterium Epulopiscium fishelsoni, which lives in the gut of the surgeon fish and can be up to (more...)

Figure 1-18. The structure of a bacterium.

Figure 1-18

The structure of a bacterium. (A) The bacterium Vibrio cholerae, showing its simple internal organization. Like many other species, Vibrio has a helical appendage at one end—a flagellum—that rotates as a propeller to drive the cell forward. (more...)

Procaryotic cells live in an enormous variety of ecological niches, and they are astonishingly varied in their biochemical capabilities—far more so than eucaryotic cells. There are organotrophic species that can utilize virtually any type of organic molecule as food, from sugars and amino acids to hydrocarbons and methane gas. There are many phototrophic species (Figure 1-19), harvesting light energy in a variety of ways, some of them generating oxygen as a byproduct, others not. And there are lithotrophic species that can feed on a plain diet of inorganic nutrients, getting their carbon from CO2, and relying on H2S to fuel their energy needs (Figure 1-20)—or on H2, or Fe2+, or elemental sulfur, or any of a host of other chemicals that occur in the environment.

Figure 1-19. The phototrophic bacterium Anabaena cylindrica viewed in the light microscope.

Figure 1-19

The phototrophic bacterium Anabaena cylindrica viewed in the light microscope. The cells of this species form long, multicellular filaments. Most of the cells (labeled V) perform photosynthesis, while others become specialized for nitrogen fixation (labeled (more...)

Figure 1-20. A lithotrophic bacterium.

Figure 1-20

A lithotrophic bacterium. Beggiatoa, which lives in sulfurous environments, gets its energy by oxidizing H2S and can fix carbon even in the dark. Note the yellow deposits of sulfur inside the cells. (Courtesy of Ralph W. Wolfe.)

Many parts of this world of microscopic organisms are virtually unexplored. Traditional methods of bacteriology have given us a fair acquaintance with those species that can be isolated and cultured in the laboratory. But DNA sequence analysis of the populations of bacteria in fresh samples from natural habitats—such as soil or ocean water, or even the human mouth—has opened our eyes to the fact that most species cannot be cultured by standard laboratory techniques. According to one estimate, at least 99% of procaryotic species remain to be characterized.

The Tree of Life Has Three Primary Branches: Bacteria, Archaea, and Eucaryotes

The classification of living things has traditionally depended on comparisons of their outward appearances: we can see that a fish has eyes, jaws, backbone, brain, and so on, just as we do, and that a worm does not; that a rosebush is cousin to an apple tree, but less similar to a grass. We can readily interpret such close family resemblances in terms of evolution from common ancestors, and we can find the remains of many of these ancestors preserved in the fossil record. In this way, it has been possible to begin to draw a family tree of living organisms, showing the various lines of descent, as well as branch points in the history, where the ancestors of one group of species became different from those of another.

When the disparities between organisms become very great, however, these methods begin to fail. How are we to decide whether a fungus is closer kin to a plant or to an animal? When it comes to procaryotes, the task becomes harder still: one microscopic rod or sphere looks much like another. Microbiologists have therefore sought to classify procaryotes in terms of their biochemistry and nutritional requirements. But this approach also has its pitfalls. Amid the bewildering variety of biochemical behaviors, it is difficult to know which differences truly reflect differences of evolutionary history.

Genome analysis has transformed the problem, giving us a simpler, more direct, and more powerful way to determine evolutionary relationships. The complete DNA sequence of an organism defines the species with almost perfect precision and in exhaustive detail. Moreover, this specification, once we have determined it, is in a digital form—a string of letters—that can be fed directly into a computer and compared with the corresponding information for any other living thing. Because DNA is subject to random changes that accumulate over long periods of time (as we shall see shortly), the number of differences between the DNA sequences of two organisms can be used to provide a direct, objective, quantitative indication of the evolutionary distance between them.

This approach has shown that some of the organisms that were traditionally classed together as “bacteria” are as widely divergent in their evolutionary origins as is any procaryote from any eucaryote. It now appears that the procaryotes comprise two distinct groups that diverged early in the history of life on Earth, either before the ancestors of the eucaryotes diverged as a separate group or at about the same time. The two groups of procaryotes are called the bacteria (or eubacteria) and the archaea (or archaebacteria). The living world therefore has three major divisions or domains: bacteria, archaea, and eucaryotes (Figure 1-21).

Figure 1-21. The three major divisions (domains) of the living world.

Figure 1-21

The three major divisions (domains) of the living world. Note that traditionally the word bacteria has been used to refer to procaryotes in general, but more recently has been redefined to refer to eubacteria specifically. Where there might be ambiguity, (more...)

Archaea were initially discovered as inhabitants of environments that we humans avoid, such as bogs, sewage farms, ocean depths, salt brines, and hot acid springs, although it is now known that they are also widespread in less extreme and more homely environments, from soils and lakes to the stomachs of cattle. In outward appearance they are not easily distinguished from the more familiar eubacteria. At a molecular level, archaea seem to resemble eucaryotes more closely in their machinery for handling genetic information (replication, transcription, and translation), but eubacteria more closely in their apparatus for metabolism and energy conversion. We discuss below how this might be explained.

Some Genes Evolve Rapidly; Others Are Highly Conserved

Both in the storage and in the copying of genetic information, random accidents and errors occur, altering the nucleotide sequence—that is, creating mutations. Therefore, when a cell divides, its two daughters are often not quite identical to one another or to their parent. On rare occasions, the error may represent a change for the better; more probably, it will cause no significant difference in the cell's prospects; and in many cases, the error will cause serious damage—for example, by disrupting the coding sequence for a key protein. Changes due to mistakes of the first type will tend to be perpetuated, because the altered cell has an increased likelihood of reproducing itself. Changes due to mistakes of the second type—selectively neutral changes—may be perpetuated or not: in the competition for limited resources, it is a matter of chance whether the altered cell or its cousins will succeed. But changes that cause serious damage lead nowhere: the cell that suffers them dies, leaving no progeny. Through endless repetition of this cycle of error and trial—of mutation and natural selection—organisms evolve: their genetic specifications change, giving them new ways to exploit the environment more effectively, to survive in competition with others, and to reproduce successfully.

Clearly, some parts of the genome change more easily than others in the course of evolution. A segment of DNA that does not code for protein and has no significant regulatory role is free to change at a rate limited only by the frequency of random errors. In contrast, a gene that codes for a highly optimized essential protein or RNA molecule cannot alter so easily: when mistakes occur, the faulty cells are almost always eliminated. Genes of this latter sort are therefore highly conserved. Through 3.5 billion years or more of evolutionary history, many features of the genome have changed beyond all recognition; but the most highly conserved genes remain perfectly recognizable in all living species.

These latter genes are the ones that must be examined if we wish to trace family relationships between the most distantly related organisms in the tree of life. The studies that led to the classification of the living world into the three domains of bacteria, archaea, and eucaryotes were based chiefly on analysis of one of the ribosomal RNA subunits—the so-called 16S RNA, which is about 1500 nucleotides long. Because the process of translation is fundamental to all living cells, this component of the ribosome has been well conserved since early in the history of life on Earth (Figure 1-22).

Figure 1-22. Genetic information conserved since the beginnings of life.

Figure 1-22

Genetic information conserved since the beginnings of life. A part of the gene for the smaller of the two main RNA components of the ribosome is shown. Corresponding segments of nucleotide sequence from an archaean (Methanococcus jannaschii), a eubacterium (more...)

Most Bacteria and Archaea Have 1000–4000 Genes

Natural selection has generally favored those procaryotic cells that can reproduce the fastest by taking up raw materials from their environment and replicating themselves most efficiently, at the maximal rate permitted by the available food supplies. Small size implies a large ratio of surface area to volume, thereby helping to maximize the uptake of nutrients across the plasma membrane and boosting a cell's reproductive rate.

Presumably for these reasons, most procaryotic cells carry very little superfluous baggage; their genomes are small and compact, with genes packed closely together and minimal quantities of regulatory DNA between them. The small genome size makes it relatively easy to determine the complete DNA sequence. We now have this information for many species of eubacteria and archaea, and a few species of eucaryotes. As shown in Table 1-1, most eubacterial and archaean genomes contain between 106 and 107 nucleotide pairs, encoding 1000–4000 genes.

Table 1-1. Some Genomes That Have Been Completely Sequenced.

Table 1-1

Some Genomes That Have Been Completely Sequenced.

A complete DNA sequence reveals both the genes an organism possesses and the genes it lacks. When we compare the three domains of the living world, we can begin to see which genes are common to all of them and must therefore have been present in the cell that was ancestral to all present-day living things, and which genes are peculiar to a single branch in the tree of life. To explain the findings, however, we need to consider a little more closely how new genes arise and genomes evolve.

New Genes Are Generated from Preexisting Genes

The raw material of evolution is the DNA sequence that already exists: there is no natural mechanism for making long stretches of new random sequence. In this sense, no gene is ever entirely new. Innovation can, however, occur in several ways (Figure 1-23):

  • Intragenic mutation: an existing gene can be modified by mutations in its DNA sequence.
  • Gene duplication: an existing gene can be duplicated so as to create a pair of closely related genes within a single cell.
  • Segment shuffling: two or more existing genes can be broken and rejoined to make a hybrid gene consisting of DNA segments that originally belonged to separate genes.
  • Horizontal (intercellular) transfer: a piece of DNA can be transferred from the genome of one cell to that of another—even to that of another species. This process is in contrast with the usual vertical transfer of genetic information from parent to progeny.

Figure 1-23. Four modes of genetic innovation and their effects on the DNA sequence of an organism.

Figure 1-23

Four modes of genetic innovation and their effects on the DNA sequence of an organism.

Each of these types of change leaves a characteristic trace in the DNA sequence of the organism, providing clear evidence that all four processes have occurred. In later chapters we discuss the underlying mechanisms, but for the present we focus on the consequences.

Gene Duplications Give Rise to Families of Related Genes Within a Single Cell

A cell must duplicate its entire genome each time it divides into two daughter cells. However, accidents occasionally result in the duplication of just part of the genome, with retention of original and duplicate segments in a single cell. Once a gene has been duplicated in this way, one of the two gene copies is free to mutate and become specialized to perform a different function within the same cell. Repeated rounds of this process of duplication and divergence, over many millions of years, have enabled one gene to give rise to a whole family of genes within a single genome. Analysis of the DNA sequence of procaryotic genomes reveals many examples of such gene families: in Bacillus subtilis, for example, 47% of the genes have one or more obvious relatives (Figure 1-24).

Figure 1-24. Families of evolutionarily related genes in the genome of Bacillus subtilis.

Figure 1-24

Families of evolutionarily related genes in the genome of Bacillus subtilis. The biggest family consists of 77 genes coding for varieties of ABC transporters—a class of membrane transport proteins found in all three domains of the living world. (more...)

When genes duplicate and diverge in this way, the individuals of one species become endowed with multiple variants of a primordial gene. This evolutionary process has to be distinguished from the genetic divergence that occurs when one species of organism splits into two separate lines of descent at a branch point in the family tree—when the human line of descent became separate from that of chimpanzees, for example. There, the genes gradually become different in the course of evolution, but they are likely to continue to have corresponding functions in the two sister species. Genes that are related in this way—that is, genes in two separate species that derive from the same ancestral gene in the last common ancestor of those two species—are said to be orthologs. Related genes that have resulted from a gene duplication event within a single genome—and are likely to have diverged in their function—are said to be paralogs. Genes that are related by descent in either way are called homologs, a general term used to cover both types of relationship (Figure 1-25).

Figure 1-25. Paralogous genes and orthologous genes: two types of gene homology based on different evolutionary pathways.

Figure 1-25

Paralogous genes and orthologous genes: two types of gene homology based on different evolutionary pathways. (A) and (B) The most basic possibilities. (C) A more complex pattern of events that can occur.

The family relationships between genes can become quite complex (Figure 1-26). For example, an organism that possesses a family of paralogous genes (for example, the seven hemoglobin genes α, β, γ, δ, ε, ζ, and θ) may evolve into two separate species (such as humans and chimpanzees) each possessing the entire set of paralogs. All 14 genes are homologs, with the human hemoglobin α orthologous to the chimpanzee hemoglobin α, but paralogous to the human or chimpanzee hemoglobin β, and so on. Moreover, the vertebrate hemoglobins (the oxygen-binding proteins of blood) are homologous to the vertebrate myoglobins (the oxygen-binding proteins of muscle), as well as to more distant genes that code for oxygen-binding proteins in invertebrates and plants. From the DNA sequences, it is usually easy to recognize that two genes in different species are homologous; it is much more difficult to decide, without other information, whether they are orthologs.

Figure 1-26. A complex family of homologous genes.

Figure 1-26

A complex family of homologous genes. This diagram shows the pedigree of the hemoglobin (Hb), myoglobin, and globin genes of human, chick, shark, and Drosophila. The lengths of the lines represent the amount of divergence in amino acid sequence.

Genes Can Be Transferred Between Organisms, Both in the Laboratory and in Nature

Procaryotes also provide examples of the horizontal transfer of genes from one species of cell to another. The most obvious tell-tale signs are sequences recognizable as being derived from bacterial viruses, also called bacteriophages (Figure 1-27). These small packets of genetic material have evolved as parasites on the reproductive and biosynthetic machinery of host cells. They replicate in one cell, emerge from it with a protective wrapping, and then enter and infect another cell, which may be of the same or a different species. Inside a cell, they may either remain as separate fragments of DNA, known as plasmids, or insert themselves into the DNA of the host cell and become part of its regular genome. In their travels, viruses can accidentally pick up fragments of DNA from the genome of one host cell and ferry them into another cell. Such transfers of genetic material frequently occur in procaryotes, and they are common between eucaryotic cells of the same species.

Figure 1-27. The viral transfer of DNA from one cell to another.

Figure 1-27

The viral transfer of DNA from one cell to another. . (A) An electron micrograph of particles of a bacterial virus, the T4 bacteriophage. The head of this virus contains the viral DNA; the tail contains apparatus for injecting the DNA into a host bacterium. (more...)

Horizontal transfers of genes between eucaryotic cells of different species are very rare, and they do not seem to have played a significant part in eucaryote evolution. In contrast, horizontal gene transfers occur much more frequently between different species of procaryotes. Many procaryotes have a remarkable capacity to take up even nonviral DNA molecules from their surroundings and thereby capture the genetic information these molecules carry. This enables bacteria in the wild to acquire genes from neighboring cells relatively easily. Genes that confer resistance to an antibiotic or an ability to produce a toxin, for example, can be transferred from species to species and provide the recipient bacterium with a selective advantage. In this way, new and sometimes dangerous strains of bacteria have been observed to evolve in the bacterial ecosystems that inhabit hospitals or the various niches in the human body. For example, horizontal gene transfer is responsible for the spread over the past 40 years, of penicillin-resistant strains of Neisseria gonorrheae, the bacterium that causes gonorrhea. On a longer time scale, the results can be even more profound; it has been estimated that at least 18% of all of the genes in the present-day genome of E. coli have been acquired by horizontal transfer from another species within the past 100 million years.

Horizontal Exchanges of Genetic Information Within a Species Are Brought About by Sex

Horizontal exchanges of genetic information have an important role in bacterial evolution in today's world, and they may have occurred even more frequently and promiscuously in the early days of life on Earth. Indeed, it has been suggested that the genomes of present-day eubacteria, archaea, and eucaryotes originated not by divergent lines of descent from a single genome in a single ancestral type of cell, but rather as three independent anthologies of genes that have survived from the pool of genes in a primordial community of diverse cells in which genes were frequently exchanged (Figure 1-28). This could explain the otherwise puzzling observation that the eucaryotes seem more similar to archaea in their genes for the basic information-handling processes of DNA replication, transcription, and translation, but more similar to eubacteria in their genes for metabolic processes.

Figure 1-28. Horizontal gene transfers in early evolution.

Figure 1-28

Horizontal gene transfers in early evolution. In the early days of life on Earth, cells may have been less capable of maintaining their separate identities and may have exchanged genes much more readily than now. In this way, the archaean, eubacterial, (more...)

Horizontal gene transfer among bacteria may seem a surprising process, but it has a parallel in a phenomenon familiar to us all: sex. Sexual reproduction causes a large-scale horizontal transfer of genetic information between two initially separate cell lineages—those of the father and the mother. A key feature of sex, of course, is that the genetic exchange normally occurs only between individuals of the same species. But no matter whether they occur within a species or between species, horizontal gene transfers leave a characteristic imprint: they result in individuals who are related more closely to one set of relatives with respect to some genes, and more closely to another set of relatives with respect to others. By comparing the DNA sequences of individual human genomes, an intelligent visitor from outer space could deduce that humans reproduce sexually, even if it knew nothing about human behavior.

Sexual reproduction is a widespread (although not universal) phenomenon, especially among eucaryotes. Even bacteria indulge from time to time in controlled sexual exchanges of DNA with other members of their own species. Natural selection has clearly favored organisms that are capable of this behavior, although evolutionary theorists still dispute precisely what the selective advantage of sex is.

The Function of a Gene Can Often Be Deduced from Its Sequence

Family relationships among genes are important not just for their historical interest, but because they lead to a spectacular simplification in the task of deciphering gene functions. Once the sequence of a newly discovered gene has been determined, it is now possible, by tapping a few keys on a computer, to search the entire database of known gene sequences for genes related to it. In many cases, the function of one or more of these homologs will have been already determined experimentally, and thus, since gene sequence determines gene function, one can frequently make a good guess at the function of the new gene: it is likely to be similar to that of the already-known homologs.

In this way, it becomes possible to decipher a great deal of the biology of an organism simply by analyzing the DNA sequence of its genome and using the information we already have about the functions of genes in other organisms that have been more intensively studied. Mycobacterium tuberculosis, the eubacterium that causes tuberculosis, is extremely difficult to study experimentally in the laboratory and provides an example of the power of comparative genomics. DNA sequencing has revealed that this organism has a genome of 4,411,529 nucleotide pairs, containing approximately 4000 genes. Of these genes, 40% were immediately recognizable (when the genome was sequenced, in 1998) as homologs of known genes in other species, and could be tentatively assigned a function on that basis. Another 44% showed some informative similarity to other known genes—for example, containing a conserved protein domain within a longer amino acid sequence. Only 16% of the 4000 genes were totally unfamiliar. As we saw also for Bacillus subtilis (see Figure 1-24), about half the genes have sequences closely similar to those of other genes in the M. tuberculosis genome, showing that they must have arisen through relatively recent gene duplications. Compared with other bacteria, M. tuberculosis contains an exceptionally large number of genes coding for enzymes involved in the synthesis and degradation of lipid (fatty) molecules. This presumably reflects this bacterium's production of an unusual outer coat that is rich in these substances; the coat, and the enzymes that produce it, may explain how M. tuberculosis escapes destruction by the immune system of tuberculosis patients.

More Than 200 Gene Families Are Common to All Three Primary Branches of the Tree of Life

Given the complete genome sequences of representative organisms from all three domains—archaea, eubacteria, and eucaryotes—one can search systematically for homologies that span this enormous evolutionary divide. In this way we can begin to take stock of the common inheritance of all living things. There are considerable difficulties in this enterprise. For example, individual species have often lost some of the ancestral genes; other genes have probably been acquired by horizontal transfer from another species and therefore may not be truly ancestral, even though shared. Recent genome comparisons strongly suggest that both lineage-specific gene loss and horizontal gene transfer, in some cases between evolutionarily distant species, have been major factors of evolution, at least in the procaryotic world. Finally, in the course of 2 or 3 billion years, some genes that were initially shared will have changed beyond recognition by current methods.

Because of all these vagaries of the evolutionary process, it seems that only a small proportion of ancestral gene families have been universally retained in a recognizable form. Thus, out of 2264 protein-coding gene families recently defined by comparing the genomes of 18 bacteria, 6 archaeans and 1 eucaryote (yeast), only 76 are truly ubiquitous (that is, represented in all the genomes analyzed). The great majority of these universal families include components of the translation and transcription systems. This is not likely to be a realistic approximation of an ancestral gene set. A better—though still crude—idea of the latter can be obtained by tallying the gene families that have representatives in multiple, but not necessarily all, species from all three major kingdoms. Such an analysis reveals 239 ancient conserved families. With a single exception, these families can be assigned a function (at least in terms of general biochemical activity, but usually with more precision), with the largest number of shared gene families being involved in translation and ribosome production and in amino acid metabolism and transport (Table 1-2). This set of highly conserved gene families represents only a very rough sketch of the common inheritance of all modern life; a more precise reconstruction of the gene complement of the last universal common ancestor might be feasible with further genome sequencing and more careful comparative analysis.

Table 1-2. The Numbers of Gene Families, Classified by Function, That Are Common to All Three Domains of the Living World.

Table 1-2

The Numbers of Gene Families, Classified by Function, That Are Common to All Three Domains of the Living World.

Mutations Reveal the Functions of Genes

Without additional information, no amount of gazing at genome sequences will reveal the functions of genes. We may recognize that gene B is like gene A, but how do we discover the function of gene A in the first place? And even if we know the function of gene A, how do we test whether the function of gene B is truly the same as the sequence similarity suggests? How do we make the connection between the world of abstract genetic information and the world of real living organisms?

The analysis of gene functions depends heavily on two complementary approaches: genetics and biochemistry. Genetics starts with the study of mutants: we either find or make an organism in which a gene is altered, and examine the effects on the organism's structure and performance (Figure 1-29). Biochemistry examines the functions of molecules: we extract molecules from an organism and then study their chemical activities. By putting genetics and biochemistry together and examining the chemical abnormalities in a mutant organism, it is possible to find those molecules whose production depends on a given gene. At the same time, studies of the performance of the mutant organism show us what role those molecules have in the operation of the organism as a whole. Thus, genetics and biochemistry in combination provide a way to work out the connection between genes, molecules, and the structure and function of the organism.

Figure 1-29. A mutant phenotype reflecting the function of a gene.

Figure 1-29

A mutant phenotype reflecting the function of a gene. A normal yeast (of the species Schizosaccharomyces pombe) is compared with a mutant in which a change in a single gene has converted the cell from a cigar shape (left) to a T shape (right). The mutant (more...)

In recent years, DNA sequence information and the powerful tools of molecular biology have allowed rapid progress. From sequence comparisons, one can often identify particular domains within a gene that have been preserved nearly unchanged over the course of evolution. These conserved domains are likely to be the most important parts of the gene in terms of function. We can test their individual contributions to the activity of the gene product by creating in the laboratory mutations of specific sites within the gene, or by constructing artificial hybrid genes that combine part of one gene with part of another. Organisms can be engineered to make either the RNA or the protein specified by the gene in large quantities to facilitate biochemical analysis. Specialists in molecular structure can determine the three-dimensional conformation of the gene product, revealing the exact position of every atom in it. Biochemists can determine how each of the parts of the genetically specified molecule contributes to its chemical behavior. Cell biologists can analyze the behavior of cells that are engineered to express a mutant version of the gene.

There is, however, no one simple recipe for discovering a gene's function, and no simple standard universal format for describing it. We may discover, for example, that the product of a given gene catalyzes a certain chemical reaction, and yet have no idea how or why that reaction is important to the organism. The functional characterization of each new family of gene products, unlike the description of the gene sequences, presents a fresh challenge to the biologist's ingenuity. Moreover, the function of a gene is never fully understood until we learn its role in the life of the organism as a whole. To make ultimate sense of gene functions, therefore, we have to study whole organisms, not just molecules or cells.

Molecular Biologists Have Focused a Spotlight on E. coli

Because living organisms are so complex, the more we learn about any particular species, the more attractive it becomes as an object for further study. Each discovery raises new questions and provides new tools with which to tackle questions in the context of the chosen organism. For this reason, large communities of biologists have become dedicated to studying different aspects of the same model organism.

In the enormously varied world of bacteria, the spotlight of molecular biology has for a long time focused intensely on just one species: Escherichia coli, or E. coli (see Figures 1-17 and 1-18). This small, rod-shaped eubacterial cell normally lives in the gut of humans and other vertebrates, but it can be grown easily in a simple nutrient broth in a culture bottle. Evolution has optimized it to cope with variable chemical conditions and to reproduce rapidly. Its genetic instructions are contained in a single, circular molecule of DNA that is 4,639,221 nucleotide-pairs long, and it makes approximately 4300 different kinds of proteins (Figure 1-30).

Figure 1-30. The genome of E. coli.

Figure 1-30

The genome of E. coli. (A) A cluster of E. coli cells. (B) A diagram of the E. coli genome of 4,639,221 nucleotide pairs (for E. coli strain K-12). The diagram is circular because the DNA of E. coli, like that of other procaryotes, forms a single, closed (more...)

In molecular terms, we have a more thorough knowledge of the workings of E. coli than of any other living organism. Most of our understanding of the fundamental mechanisms of life—for example, how cells replicate their DNA to pass on the genetic instructions to their progeny, or how they decode the instructions represented in the DNA to direct the synthesis of specific proteins—has come from studies of E. coli. The basic genetic mechanisms have turned out to be highly conserved throughout evolution: these mechanisms are therefore essentially the same in our own cells as in E. coli.


Procaryotes (cells without a distinct nucleus) are biochemically the most diverse organisms and include species that can obtain all their energy and nutrients from inorganic chemical sources, such as the reactive mixtures of minerals released at hydrothermal vents on the ocean floor—the sort of diet that may have nourished the first living cells 3.5 billion years ago. DNA sequence comparisons reveal the family relationships of living organisms and show that the procaryotes fall into two groups that diverged early in the course of evolution: the bacteria (or eubacteria) and the archaea. Together with the eucaryotes (cells with a membrane-bounded nucleus), these constitute the three primary branches of the tree of life. Most bacteria and archaea are small unicellular organisms with compact genomes comprising 1000–4000 genes. Many of the genes within a single organism show strong family resemblances in their DNA sequences, implying that they originated from the same ancestral gene through gene duplication and divergence. Family resemblances (homologies) are also clear when gene sequences are compared between different species, and more than 200 gene families have been so highly conserved that they can be recognized as common to all three domains of the living world. Thus, given the DNA sequence of a newly discovered gene, it is often possible to deduce the gene's function from the known function of a homologous gene in an intensively studied model organism, such as the bacterium E. coli.

By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.

Copyright © 2002, Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter; Copyright © 1983, 1989, 1994, Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson .
Bookshelf ID: NBK26866