NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Alberts B, Johnson A, Lewis J, et al. Molecular Biology of the Cell. 4th edition. New York: Garland Science; 2002.

  • By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.
Cover of Molecular Biology of the Cell

Molecular Biology of the Cell. 4th edition.

Show details

Studying Gene Expression and Function

Ultimately, one wishes to determine how genes—and the proteins they encode—function in the intact organism. Although it may sound counterintuitive, one of the most direct ways to find out what a gene does is to see what happens to the organism when that gene is missing. Studying mutant organisms that have acquired changes or deletions in their nucleotide sequences is a time-honored practice in biology. Because mutations can interrupt cellular processes, mutants often hold the key to understanding gene function. In the classical approach to the important field of genetics, one begins by isolating mutants that have an interesting or unusual appearance: fruit flies with white eyes or curly wings, for example. Working backward from the phenotype—the appearance or behavior of the individual—one then determines the organism's genotype, the form of the gene responsible for that characteristic (Panel 8-1).

Box Icon

Panel 8-1

Review of Classical Genetics.

Today, with numerous genome projects adding tens of thousands of nucleotide sequences to the public databases each day, the exploration of gene function often begins with a DNA sequence. Here the challenge is to translate sequence into function. One approach, discussed earlier in the chapter, is to search databases for well-characterized proteins that have similar amino acid sequences to the protein encoded by a new gene, and from there employ some of the methods described in the previous section to explore the gene's function further. But to tackle directly the problem of how a gene functions in a cell or organism, the most effective approach involves studying mutants that either lack the gene or express an altered version of it. Determining which cellular processes have been disrupted or compromised in such mutants will then frequently provide a window to a gene's biological role.

In this section, we describe several different approaches to determining a gene's function, whether one starts from a DNA sequence or from an organism with an interesting phenotype. We begin with the classical genetic approach to studying genes and gene function. These studies start with a genetic screen for isolating mutants of interest, and then proceed toward identification of the gene or genes responsible for the observed phenotype. We then review the collection of techniques that fall under the umbrella of reverse genetics, in which one begins with a gene or gene sequence and attempts to determine its function. This approach often involves some intelligent guesswork—searching for homologous sequences and determining when and where a gene is expressed—as well as generating mutant organisms and characterizing their phenotype.

The Classical Approach Begins with Random Mutagenesis

Before the advent of gene cloning technology, most genes were identified by the processes disrupted when the gene was mutated. This classical genetic approach—identifying the genes responsible for mutant phenotypes—is most easily performed in organisms that reproduce rapidly and are amenable to genetic manipulation, such as bacteria, yeasts, nematode worms, and fruit flies. Although spontaneous mutants can sometimes be found by examining extremely large populations—thousands or tens of thousands of individual organisms—the process of isolating mutants can be made much more efficient by generating mutations with agents that damage DNA. By treating organisms with mutagens, very large numbers of mutants can be created quickly and then screened for a particular defect of interest, as we will see shortly.

An alternative approach to chemical or radiation mutagenesis is called insertional mutagenesis. This method relies on the fact that exogenous DNA inserted randomly into the genome can produce mutations if the inserted fragment interrupts a gene or its regulatory sequences. The inserted DNA, whose sequence is known, then serves as a molecular tag that aids in the subsequent identification and cloning of the disrupted gene (Figure 8-55). In Drosophila, the use of the transposable P element to inactivate genes has revolutionized the study of gene function in the fruit fly. Transposable elements (see Table 5-3, p. 287) have also been used to generate mutants in bacteria, yeast, and in the flowering plant Arabidopsis. Retroviruses, which copy themselves into the host genome (see Figure 5-73), have been used to disrupt genes in zebrafish and in mice.

Figure 8-55. Insertional mutant of the snapdragon, Antirrhinum.

Figure 8-55

Insertional mutant of the snapdragon, Antirrhinum. A mutation in a single gene coding for a regulatory protein causes leafy shoots to develop in place of flowers. The mutation allows cells to adopt a character that would be appropriate to a different (more...)

Such studies are well suited for dissecting biological processes in worms and flies, but how can we study gene function in humans? Unlike the organisms we have been discussing, humans do not reproduce rapidly, and they are not intentionally treated with mutagens. Moreover, any human with a serious defect in an essential process, such as DNA replication, would die long before birth.

There are two answers to the question of how we study human genes. First, because genes and gene functions have been so highly conserved throughout evolution, the study of less complex model organisms reveals critical information about similar genes and processes in humans. The corresponding human genes can then be studied further in cultured human cells. Second, many mutations that are not lethal—tissue-specific defects in lysosomes or in cell-surface receptors, for example—have arisen spontaneously in the human population. Analyses of the phenotypes of the affected individuals, together with studies of their cultured cells, have provided many unique insights into important human cell functions. Although such mutations are rare, they are very efficiently discovered because of a unique human property: the mutant individuals call attention to themselves by seeking special medical care.

Genetic Screens Identify Mutants Deficient in Cellular Processes

Once a collection of mutants in a model organism such as yeast or flies has been produced, one generally must examine thousands of individuals to find the altered phenotype of interest. Such a search is called a genetic screen. Because obtaining a mutation in a gene of interest depends on the likelihood that the gene will be inactivated or otherwise mutated during random mutagenesis, the larger the genome, the less likely it is that any particular gene will be mutated. Therefore, the more complex the organism, the more mutants must be examined to avoid missing genes. The phenotype being screened for can be simple or complex. Simple phenotypes are easiest to detect: a metabolic deficiency, for example, in which an organism is no longer able to grow in the absence of a particular amino acid or nutrient.

Phenotypes that are more complex, for example mutations that cause defects in learning or memory, may require more elaborate screens (Figure 8-56). But even genetic screens that are used to dissect complex physiological systems should be as simple as possible in design, and, if possible, should permit the examination of large numbers of mutants simultaneously. As an example, one particularly elegant screen was designed to search for genes involved in visual processing in the zebrafish. The basis of this screen, which monitors the fishes' response to motion, is a change in behavior. Wild-type fish tend to swim in the direction of a perceived motion, while mutants with defects in their visual systems swim in random directions—a behavior that is easily detected. One mutant discovered in this screen is called lakritz, which is missing 80% of the retinal ganglion cells that help to relay visual signals from the eye to the brain. As the cellular organization of the zebrafish retina mirrors that of all vertebrates, the study of such mutants should also provide insights into visual processing in humans.

Figure 8-56. Screens can detect mutations that affect an animal's behavior.

Figure 8-56

Screens can detect mutations that affect an animal's behavior. (A) Wild-type C. elegans engage in social feeding. The worms swim around until they encounter their neighbors and commence feeding. (B) Mutant animals feed by themselves. (Courtesy of Cornelia (more...)

Because defects in genes that are required for fundamental cell processes—RNA synthesis and processing or cell cycle control, for example—are usually lethal, the functions of these genes are often studied in temperature-sensitive mutants. In these mutants the protein product of the mutant gene functions normally at a medium temperature, but can be inactivated by a small increase or decrease in temperature. Thus the abnormality can be switched on and off experimentally simply by changing the temperature. A cell containing a temperature-sensitive mutation in a gene essential for survival at a non-permissive temperature can nevertheless grow at the normal or permissive temperature (Figure 8-57). The temperature-sensitive gene in such a mutant usually contains a point mutation that causes a subtle change in its protein product.

Figure 8-57. Screening for temperature-sensitive bacterial or yeast mutants.

Figure 8-57

Screening for temperature-sensitive bacterial or yeast mutants. Mutagenized cells are plated out at the permissive temperature. The resulting colonies are transferred to two identical Petri dishes by replica plating; one of these plates is incubated at (more...)

Many temperature-sensitive mutants were isolated in the genes that encode the bacterial proteins required for DNA replication by screening populations of mutagen-treated bacteria for cells that stop making DNA when they are warmed from 30°C to 42°C. These mutants were later used to identify and characterize the corresponding DNA replication proteins (discussed in Chapter 5). Temperature-sensitive mutants also led to the identification of many proteins involved in regulating the cell cycle and in moving proteins through the secretory pathway in yeast (see Panel 13-1). Related screening approaches have demonstrated the function of enzymes involved in the principal metabolic pathways of bacteria and yeast (discussed in Chapter 2), as well as discovering many of the gene products responsible for the orderly development of the Drosophila embryo (discussed in Chapter 21).

A Complementation Test Reveals Whether Two Mutations Are in the Same or in Different Genes

A large-scale genetic screen can turn up many different mutants that show the same phenotype. These defects might lie in different genes that function in the same process, or they might represent different mutations in the same gene. How can we tell, then, whether two mutations that produce the same phenotype occur in the same gene or in different genes? If the mutations are recessive—if, for example, they represent a loss of function of a particular gene—a complementation test can be used to ascertain whether the mutations fall in the same or in different genes. In the simplest type of complementation test, an individual that is homozygous for one mutation—that is, it possesses two identical alleles of the mutant gene in question—is mated with an individual that is homozygous for the other mutation. If the two mutations are in the same gene, the offspring show the mutant phenotype, because they still will have no normal copies of the gene in question (see Panel 8-1, pp. 526–527). If, in contrast, the mutations fall in different genes, the resulting offspring show a normal phenotype. They retain one normal copy (and one mutant copy) of each gene. The mutations thereby complement one another and restore a normal phenotype. Complementation testing of mutants identified during genetic screens has revealed, for example, that 5 genes are required for yeast to digest the sugar galactose; that 20 genes are needed for E. coli to build a functional flagellum; that 48 genes are involved in assembling bacteriophage T4 viral particles; and that hundreds of genes are involved in the development of an adult nematode worm from a fertilized egg.

Once a set of genes involved in a particular biological process has been identified, the next step is to determine in which order the genes function. Determining when a gene acts can facilitate the reconstruction of entire genetic or biochemical pathways, and such studies have been central to our understanding of metabolism, signal transduction, and many other developmental and physiological processes. In essence, untangling the order in which genes function requires careful characterization of the phenotype caused by mutations in each different gene. Imagine, for example, that mutations in a handful of genes all cause an arrest in cell division during early embryo development. Close examination of each mutant may reveal that some act extremely early, preventing the fertilized egg from dividing into two cells. Other mutations may allow early cell divisions but prevent the embryo from reaching the blastula stage.

To test predictions made about the order in which genes function, organisms can be made that are mutant in two different genes. If these mutations affect two different steps in the same process, such double mutants should have a phenotype identical to that of the mutation that acts earliest in the pathway. As an example, the pathway of protein secretion in yeast has been deciphered in this manner. Different mutations in this pathway cause proteins to accumulate aberrantly in the endoplasmic reticulum (ER) or in the Golgi apparatus. When a cell is engineered to harbor both a mutation that blocks protein processing in the ER and a mutation that blocks processing in the Golgi compartment, proteins accumulate in the ER. This indicates that proteins must pass through the ER before being sent to the Golgi before secretion (Figure 8-58).

Figure 8-58. Using genetics to determine the order of function of genes.

Figure 8-58

Using genetics to determine the order of function of genes. In normal cells, proteins are loaded into vesicles, which fuse with the plasma membrane and secrete their contents into the extracellular medium. In secretory mutant A, proteins accumulate in (more...)

Genes Can Be Located by Linkage Analysis

With mutants in hand, the next step is to identify the gene or genes that seem to be responsible for the altered phenotype. If insertional mutagenesis was used for the original mutagenesis, locating the disrupted gene is fairly simple. DNA fragments containing the insertion (a transposon or a retrovirus, for example) are collected and amplified, and the nucleotide sequence of the flanking DNA is determined. This sequence is then used to search a DNA database to identify the gene that was interrupted by insertion of the transposable element.

If a DNA-damaging chemical was used to generate the mutants, identifying the inactivated gene is often more laborious and can be accomplished by several different approaches. In one, the first step is to determine where on the genome the gene is located. To map a newly discovered gene, its rough chromosomal location is first determined by assessing how far the gene lies from other known genes in the genome. Estimating the distance between genetic loci is usually done by linkage analysis, a technique that relies on the fact that genes that lie near one another on a chromosome tend to be inherited together. The closer the genes are, the greater the likelihood they will be passed to offspring as a pair. Even closely linked genes, however, can be separated by recombination during meiosis. The larger the distance between two genetic loci, the greater the chance that they will be separated by a crossover (see Panel 8-1, pp. 526–527). By calculating the recombination frequency between two genes, the approximate distance between them can be determined.

Because genes are not always located close enough to one another to allow a precise pinpointing of their position, linkage analyses often rely on physical markers along the genome for estimating the location of an unknown gene. These markers are generally nucleotide fragments, with a known sequence and genome location, that can exist in at least two allelic forms. Single-nucleotide polymorphisms (SNPs), for example, are short sequences that differ by one or more nucleotides among individuals in a population. SNPs can be detected by hybridization techniques. Many such physical markers, distributed all along the length of chromosomes, have been collected for a variety of organisms, including more than 106 for humans. If the distribution of these markers is sufficiently dense, one can, through a linkage analysis that tests for the tight coinheritance of one or more SNPs with the mutant phenotype, narrow the potential location of a gene to a chromosomal region that may contain only a few gene sequences. These are then considered candidate genes, and their structure and function can be tested directly to determine which gene is responsible for the original mutant phenotype.

Linkage analysis can be used in the same way to identify the genes responsible for heritable human disorders. Such studies require that DNA samples be collected from a large number of families affected by the disease. These samples are examined for the presence of physical markers such as SNPs that seem to be closely linked to the disease gene—these sequences would always be inherited by individuals who have the disease, and not by their unaffected relatives. The disease gene is then located as described above (Figure 8-59). The genes for cystic fibrosis and Huntington's disease, for example, were discovered in this manner.

Figure 8-59. Genetic linkage analysis using physical markers on the DNA to find a human gene.

Figure 8-59

Genetic linkage analysis using physical markers on the DNA to find a human gene. In this example, one studies the coinheritance of a specific human phenotype (here a genetic disease) with a SNP marker. If individuals who inherit the disease nearly always (more...)

Searching for Homology Can Help Predict a Gene's Function

Once a gene has been identified, its function can often be predicted by identifying homologous genes whose functions are already known. As we discussed earlier, databases containing nucleotide sequences from a variety of organisms—including the complete genome sequences of many dozens of microbes, C. elegans, A. thaliana, D. melanogaster, and human—can be searched for sequences that are similar to those of the uncharacterized target gene.

When analyzing a newly sequenced genome, such a search serves as a first-pass attempt to assign functions to as many genes as possible, a process called annotation. Further genetic and biochemical studies are then performed to confirm whether the gene encodes a product with the predicted function, as we discuss shortly. Homology analysis does not always reveal information about function: in the case of the yeast genome, 30% of the previously uncharacterized genes could be assigned a putative function by homology analysis; 10% had homologues whose function was also unknown; and another 30% had no homologues in any existing databases. (The remaining 30% of the genes had been identified before sequencing the yeast genome.)

In some cases, a homology search turns up a gene in organism A which produces a protein that, in a different organism, is fused to a second protein that is produced by an independent gene in organism A. In yeast, for example, two separate genes encode two proteins that are involved in the synthesis of tryptophan; in E. coli, however, these two genes are fused into one (Figure 8-60). Knowledge that these two proteins in yeast correspond to two domains in a single bacterial protein means that they are likely to be functionally associated, and probably work together in a protein complex. More generally, this approach is used to establish functional links between genes that, for most organisms, are widely separated in the genome.

Figure 8-60. Domain fusions reveal relationships between functionally linked genes.

Figure 8-60

Domain fusions reveal relationships between functionally linked genes. In this example, the functional interaction of genes 1 and 2 in organism A is inferred by the fusion of homologous domains into a single gene (gene 3) in organism B.

Reporter Genes Reveal When and Where a Gene Is Expressed

Clues to gene function can often be obtained by examining when and where a gene is expressed in the cell or in the whole organism. Determining the pattern and timing of gene expression can be accomplished by replacing the coding portion of the gene under study with a reporter gene. In most cases, the expression of the reporter gene is then monitored by tracking the fluorescence or enzymatic activity of its protein product (pp. 518–519).

As discussed in detail in Chapter 7, gene expression is controlled by regulatory DNA sequences, located upstream or downstream of the coding region, which are not generally transcribed. These regulatory sequences, which control which cells will express a gene and under what conditions, can also be made to drive the expression of a reporter gene. One simply replaces the target gene's coding sequence with that of the reporter gene, and introduces these recombinant DNA molecules into cells. The level, timing, and cell specificity of reporter protein production reflect the action of the regulatory sequences that belong to the original gene (Figure 8-61).

Figure 8-61. Using a reporter protein to determine the pattern of a gene's expression.

Figure 8-61

Using a reporter protein to determine the pattern of a gene's expression. (A) In this example the coding sequence for protein X is replaced by the coding sequence for protein Y. (B) Various fragments of DNA containing candidate regulatory sequences are (more...)

Several other techniques, discussed previously, can also be used to determine the expression pattern of a gene. Hybridization techniques such as Northern analysis (see Figure 8-27) and in situ hybridization for RNA detection (see Figure 8-29) can reveal when genes are transcribed and in which tissue, and how much mRNA they produce.

Microarrays Monitor the Expression of Thousands of Genes at Once

So far we have discussed techniques that can be used to monitor the expression of only a single gene at a time. Many of these methods are fairly labor-intensive: generating reporter gene constructs or GFP fusions requires manipulating DNA and transfecting cells with the resulting recombinant molecules. Even Northern analyses are limited in scope by the number of samples that can be run on an agarose gel. Developed in the 1990s, DNA microarrays have revolutionized the way in which gene expression is now analyzed by allowing the RNA products of thousands of genes to be monitored at once. By examining the expression of so many genes simultaneously, we can now begin to identify and study the gene expression patterns that underlie cellular physiology: we can see which genes are switched on (or off) as cells grow, divide, or respond to hormones or to toxins.

DNA microarrays are little more than glass microscope slides studded with a large number of DNA fragments, each containing a nucleotide sequence that serves as a probe for a specific gene. The most dense arrays may contain tens of thousands of these fragments in an area smaller than a postage stamp, allowing thousands of hybridization reactions to be performed in parallel (Figure 8-62). Some microarrays are generated from large DNA fragments that have been generated by PCR and then spotted onto the slides by a robot. Others contain short oligonucleotides that are synthesized on the surface of the glass wafer with techniques similar to those that are used to etch circuits onto computer chips. In either case, the exact sequence—and position—of every probe on the chip is known. Thus any nucleotide fragment that hybridizes to a probe on the array can be identified as the product of a specific gene simply by detecting the position to which it is bound.

Figure 8-62. Using DNA microarrays to monitor the expression of thousands of genes simultaneously.

Figure 8-62

Using DNA microarrays to monitor the expression of thousands of genes simultaneously. To prepare the microarray, DNA fragments—each corresponding to a gene—are spotted onto a slide by a robot. Prepared arrays are also available commercially. (more...)

To use a DNA microarray to monitor gene expression, mRNA from the cells being studied is first extracted and converted to cDNA (see Figure 8-34). The cDNA is then labeled with a fluorescent probe. The microarray is incubated with this labeled cDNA sample and hybridization is allowed to occur (see Figure 8-62). The array is then washed to remove cDNA that is not tightly bound, and the positions in the microarray to which labeled DNA fragments have bound are identified by an automated scanning-laser microscope. The array positions are then matched to the particular gene whose sample of DNA was spotted in this location.

Typically the fluorescent DNA from the experimental samples (labeled, for example, with a red fluorescent dye) are mixed with a reference sample of cDNA fragments labeled with a differently colored fluorescent dye (green, for example). Thus, if the amount of RNA expressed from a particular gene in the cells of interest is increased relative to that of the reference sample, the resulting spot is red. Conversely, if the gene's expression is decreased relative to the reference sample, the spot is green. Using such an internal reference, gene expression profiles can be tabulated with great precision.

So far, DNA microarrays have been used to examine everything from the change in gene expression that make strawberries ripen to the gene expression “signatures” of different types of human cancer cells (see Figure 7-3). Arrays that contain probes representing all 6000 yeast genes have been used to monitor the changes that occur in gene expression as yeast shift from fermenting glucose to growing on ethanol; as they respond to a sudden shift to heat or cold; and as they proceed through different stages of the cell cycle. The first study showed that, as yeast use up the last glucose in their medium, their gene expression pattern changes markedly: nearly 900 genes are more actively transcribed, while another 1200 decrease in activity. About half of these genes have no known function, although this study suggests that they are somehow involved in the metabolic reprogramming that occurs when yeast cells shift from fermentation to respiration.

Comprehensive studies of gene expression also provide an additional layer of information that is useful for predicting gene function. Earlier we discussed how identifying a protein's interaction partners can yield clues about that protein's function. A similar principle holds true for genes: information about a gene's function can be deduced by identifying genes that share its expression pattern. Using a technique called cluster analysis, one can identify sets of genes that are coordinately regulated. Genes that are turned on or turned off together under a variety of different circumstances may work in concert in the cell: they may encode proteins that are part of the same multiprotein machine, or proteins that are involved in a complex coordinated activity, such as DNA replication or RNA splicing. Characterizing an unknown gene's function by grouping it with known genes that share its transcriptional behavior is sometimes called “guilt by association.” Cluster analyses have been used to analyze the gene expression profiles that underlie many interesting biological processes, including wound healing in humans (Figure 8-63).

Figure 8-63. Using cluster analysis to identify sets of genes that are coordinately regulated.

Figure 8-63

Using cluster analysis to identify sets of genes that are coordinately regulated. Genes that belong to the same cluster may be involved in common cellular pathways or processes. To perform a cluster analysis, microarray data are obtained from cell samples (more...)

Targeted Mutations Can Reveal Gene Function

Although in rapidly reproducing organisms it is often not difficult to obtain mutants that are deficient in a particular process, such as DNA replication or eye development, it can take a long time to trace the defect to a particular altered protein. Recently, recombinant DNA technology and the explosion in genome sequencing have made possible a different type of genetic approach. Instead of beginning with a randomly generated mutant and using it to identify a gene and its protein, one can start with a particular gene and proceed to make mutations in it, creating mutant cells or organisms so as to analyze the gene's function. Because the new approach reverses the traditional direction of genetic discovery—proceeding from genes and proteins to mutants, rather than vice versa—it is commonly referred to as reverse genetics.

Reverse genetics begins with a cloned gene, a protein with interesting properties that has been isolated from a cell, or simply a genome sequence. If the starting point is a protein, the gene encoding it is first identified and, if necessary, its nucleotide sequence is determined. The gene sequence can then be altered in vitro to create a mutant version. This engineered mutant gene, together with an appropriate regulatory region, is transferred into a cell. Inside the cell, it can integrate into a chromosome, becoming a permanent part of the cell's genome. All of the descendants of the modified cell will now contain the mutant gene.

If the original cell used for the gene transfer is a fertilized egg, whole multicellular organisms can be obtained that contain the mutant gene, provided that the mutation does not cause lethality. In some of these animals, the altered gene will be incorporated into the germ cells—a germline mutation—allowing the mutant gene to be passed on to their progeny.

Genetic transformations of this kind are now routinely performed with organisms as complex as fruit flies and mammals. Technically, even humans could now be transformed in this way, although such procedures are not undertaken, even for therapeutic purposes, for fear of the unpredictable aberrations that might occur in such individuals.

Earlier in this chapter we discussed other approaches to discover a gene's function, including searching for homologous genes in other organisms and determining when and where a gene is expressed. This type of information is especially useful in suggesting what sort of phenotypes to look for in the mutant organisms. A gene that is expressed only in adult liver, for example, may have a role in degrading toxins, but is not likely to affect the development of the eye. All of these approaches can be used either to study single genes or to attempt a large-scale analysis of the function of every gene in an organism—a burgeoning field known as functional genomics.

Cells and Animals Containing Mutated Genes Can Be Made to Order

We have seen that searching for homologous genes and analyzing gene expression patterns can provide clues about gene function, but they do not reveal what exactly a gene does inside a cell. Genetics provides a powerful solution to this problem, because mutants that lack a particular gene may quickly reveal the function of the protein that it encodes. Genetic engineering techniques allow one to specifically produce such gene knockouts, as we will see. However, one can also generate mutants that express a gene at abnormally high levels (overexpression), in the wrong tissue or at the wrong time (misexpression), or in a slightly altered form that exerts a dominant phenotype. To facilitate such studies of gene function, the coding sequence of a gene and its regulatory regions can be engineered to change the functional properties of the protein product, the amount of protein made, or the particular cell type in which the protein is produced.

Altered genes are introduced into cells in a variety of ways, some of which are described in detail in Chapter 9. DNA can be microinjected into mammalian cells with a glass micropipette or introduced by a virus that has been engineered to carry foreign genes. In plant cells, genes are frequently introduced by a technique called particle bombardment: DNA samples are painted onto tiny gold beads and then literally shot through the cell wall with a specially modified gun. Electroporation is the method of choice for introducing DNA into bacteria and some other cells. In this technique, a brief electric shock renders the cell membrane temporarily permeable, allowing foreign DNA to enter the cytoplasm.

We will now examine how the study of such mutant cells and organisms allows the dissection of biological pathways.

The Normal Gene in a Cell Can Be Directly Replaced by an Engineered Mutant Gene in Bacteria and Some Lower Eucaryotes

Unlike higher eucaryotes (which are multicellular and diploid), bacteria, yeasts, and the cellular slime mold Dictyostelium generally exist as haploid single cells. In these organisms an artificially introduced DNA molecule carrying a mutant gene can, with a relatively high frequency, replace the single copy of the normal gene by homologous recombination (see p. 276), so that it is easy to produce cells in which the mutant gene has replaced the normal gene (Figure 8-64A). In this way cells can be made to order that produce an altered form of any specific protein or RNA molecule instead of the normal form of the molecule. If the mutant gene is completely inactive and the gene product normally performs an essential function, the cell dies; but in this case a less severely mutated version of the gene can be used to replace the normal gene, so that the mutant cell survives but is abnormal in the process for which the gene is required. Often the mutant of choice is one that produces a temperature-sensitive gene product, which functions normally at one temperature but is inactivated when cells are shifted to a higher or lower temperature.

Figure 8-64. Gene replacement, gene knockout, and gene addition.

Figure 8-64

Gene replacement, gene knockout, and gene addition. A normal gene can be altered in several ways in a genetically engineered organism. (A) The normal gene (green) can be completely replaced by a mutant copy of the gene (red), a process called gene replacement. (more...)

The ability to perform direct gene replacements in lower eucaryotes, combined with the power of standard genetic analyses in these haploid organisms, explains in large part why studies in these types of cells have been so important for working out the details of those processes that are shared by all eucaryotes. As we shall see, gene replacements are possible, but more difficult to perform in higher eucaryotes, for reasons that are not entirely understood.

Engineered Genes Can Be Used to Create Specific Dominant Negative Mutations in Diploid Organisms

Higher eucaryotes, such as mammals, fruit flies, or worms, are diploid and therefore have two copies of each chromosome. Moreover, transfection with an altered gene generally leads to gene addition rather than gene replacement: the altered gene inserts at a random location in the genome, so that the cell (or the organism) ends up with the mutated gene in addition to its normal gene copies.

Because gene addition is much more easily accomplished than gene replacement in higher eucaryotic cells, it is useful to create specific dominant negative mutations in which a mutant gene eliminates the activity of its normal counterparts in the cell. One ingenious approach exploits the specificity of hybridization reactions between two complementary nucleic acid chains. Normally, only one of the two DNA strands in a given portion of double helix is transcribed into RNA, and it is always the same strand for a given gene (see Figure 6-14). If a cloned gene is engineered so that the opposite DNA strand is transcribed instead, it will produce antisense RNA molecules that have a sequence complementary to the normal RNA transcripts. Such antisense RNA, when synthesized in large enough amounts, can often hybridize with the “sense” RNA made by the normal genes and thereby inhibit the synthesis of the corresponding protein (Figure 8-65). A related method involves synthesizing short antisense nucleic acid molecules chemically or enzymatically and then injecting (or otherwise delivering) them into cells, again blocking (although only temporarily) production of the corresponding protein. To avoid degradation of the injected nucleic acid, a stable synthetic RNA analog, called morpholino-RNA, is often used instead of ordinary RNA.

Figure 8-65. The antisense RNA strategy for generating dominant negative mutations.

Figure 8-65

The antisense RNA strategy for generating dominant negative mutations. Mutant genes that have been engineered to produce antisense RNA, which is complementary in sequence to the RNA made by the normal gene X, can cause double-stranded RNA to form inside (more...)

As investigators continued to explore the antisense RNA strategy, they made an interesting discovery. An antisense RNA strand can block gene expression, but a preparation of double-stranded RNA (dsRNA), containing both the sense and antisense strands of a target gene, inhibit the activity of target genes even more effectively (see Figure 7-107). This phenomenon, dubbed RNA interference (RNAi), has now been exploited for examining gene function in several organisms.

The RNAi technique has been widely used to study gene function in the nematode C. elegans. When working with worms, introducing the dsRNA is quite simple: RNA can be injected directly into the intestine of the animal, or the worm can be fed with E. coli expressing the target gene dsRNA (Figure 8-66A). The RNA is distributed throughout the body of the worm and is found to inhibit expression of the target gene in different tissue types. Further, as explained in Figure 7-107, the interference is frequently inherited by the progeny of the injected animal. Because the entire genome of C. elegans has been sequenced, RNAi is being used to help in assigning functions to the entire complement of worm genes. In one study, researchers were able to inhibit 96% of the approximately 2300 predicted genes on C. elegans chromosome III. In this way, they identified 133 genes involved in cell division in C. elegans embryos (Figure 8-66C). Of these, only 11 had been previously ascribed a function by direct experimentation.

Figure 8-66. Dominant negative mutations created by RNA interference.

Figure 8-66

Dominant negative mutations created by RNA interference. (A) Double-stranded RNA (dsRNA) can be introduced into C. elegans (1) by feeding the worms with E. coli expressing the dsRNA or (2) by injecting dsRNA directly into the gut. (B) Wild-type worm embryo. (more...)

For unknown reasons, RNA interference does not efficiently inactivate all genes. And interference can sometimes suppress the activity of a target gene in one tissue and not another. An alternative way to produce a dominant negative mutation takes advantage of the fact that most proteins function as part of a larger protein complex. Such complexes can often be inactivated by the inclusion of just one nonfunctional component. Therefore, by designing a gene that produces large quantities of a mutant protein that is inactive but still able to assemble into the complex, it is often possible to produce a cell in which all the complexes are inactivated despite the presence of the normal protein (Figure 8-67).

Figure 8-67. A dominant negative effect of a protein.

Figure 8-67

A dominant negative effect of a protein. Here a gene is engineered to produce a mutant protein that prevents the normal copies of the same protein from performing their function. In this simple example, the normal protein must form a multisubunit complex (more...)

If a protein is required for the survival of the cell (or the organism), a dominant negative mutant dies, making it impossible to test the function of the protein. To avoid this problem, one can couple the mutant gene to control sequences that have been engineered to produce the gene product only on command—for example, in response to an increase in temperature or to the presence of a specific signaling molecule. Cells or organisms containing such a dominant mutant gene under the control of an inducible promoter can be deprived of a specific protein at a particular time, and the effect can then be followed. Inducible promoters also allow genes to be switched on or off in specific tissues, allowing one to examine the effect of the mutant gene in selected parts of the organism. In the future, techniques for producing dominant negative mutations to inactivate specific genes are likely to be widely used to determine the functions of proteins in higher organisms.

Gain-of-Function Mutations Provide Clues to the Role Genes Play in a Cell or Organism

In the same way that cells can be engineered to express a dominant negative version of a protein, resulting in a loss-of-function phenotype, they can also be engineered to display a novel phenotype through a gain-of-function mutation. Such mutations may confer a novel activity on a particular protein, or they may cause a protein with normal activity to be expressed at an inappropriate time or in the wrong tissue in an animal. Regardless of the mechanism, gain-of-function mutations can produce a new phenotype in a cell, tissue, or organism.

Often, gain-of-function mutants are generated by expressing a gene at a much higher level than normal in cells. Such overexpression can be achieved by coupling a gene to a powerful promoter sequence and placing it on a multicopy plasmid—or integrating it in multiple copies in the genome. In either case, the gene is present in many copies and each copy directs the transcription of unusually large numbers of mRNA molecules. Although the effect that such over-expression has on the phenotype of an organism must be interpreted with caution, this approach has provided invaluable insights into the activity of many genes. In an alternate type of gain-of-function mutation, the mutant protein is made in normal amounts, but is much more active than its normal counterpart. Such proteins are frequently found in tumors, and they have been exploited to study signal transduction pathways in cells (discussed in Chapter 15).

Genes can also be expressed at the wrong time or in the wrong place in an organism—often with striking results (Figure 8-68). Such misexpression is most often accomplished by re-engineering the genes themselves, thereby supplying them with the regulatory sequences needed to alter their expression.

Figure 8-68. Ectopic misexpression of Wnt, a signaling protein that affects development of the body axis in the early Xenopus embryo.

Figure 8-68

Ectopic misexpression of Wnt, a signaling protein that affects development of the body axis in the early Xenopus embryo. In this experiment, mRNA coding for Wnt was injected into the ventral vegetal blastomere, inducing a second body axis (discussed in (more...)

Genes Can Be Redesigned to Produce Proteins of Any Desired Sequence

In studying the action of a gene and the protein it encodes, one does not always wish to make drastic changes—flooding cells with huge quantities of hyperactive protein or eliminating a gene product entirely. It is sometimes useful to make slight changes in a protein's structure so that one can begin to dissect which portions of a protein are important for its function. The activity of an enzyme, for example, can be studied by changing a single amino acid in its active site. Special techniques are required to alter genes, and their protein products, in such subtle ways. The first step is often the chemical synthesis of a short DNA molecule containing the desired altered portion of the gene's nucleotide sequence. This synthetic DNA oligonucleotide is hybridized with single-stranded plasmid DNA that contains the DNA sequence to be altered, using conditions that allow imperfectly matched DNA strands to pair (Figure 8-69). The synthetic oligonucleotide will now serve as a primer for DNA synthesis by DNA polymerase, thereby generating a DNA double helix that incorporates the altered sequence into one of its two strands. After transfection, plasmids that carry the fully modified gene sequence are obtained. The appropriate DNA is then inserted into an expression vector so that the redesigned protein can be produced in the appropriate type of cells for detailed studies of its function. By changing selected amino acids in a protein in this way—a technique called site-directed mutagenesis—one can determine exactly which parts of the polypeptide chain are important for such processes as protein folding, interactions with other proteins, and enzymatic catalysis.

Figure 8-69. The use of a synthetic oligonucleotide to modify the protein-coding region of a gene by site-directed mutagenesis.

Figure 8-69

The use of a synthetic oligonucleotide to modify the protein-coding region of a gene by site-directed mutagenesis. (A) A recombinant plasmid containing a gene insert is separated into its two DNA strands. A synthetic oligonucleotide primer corresponding (more...)

Engineered Genes Can Be Easily Inserted into the Germ Line of Many Animals

When engineering an organism that is to express an altered gene, ideally one would like to be able to replace the normal gene with the altered one so that the function of the mutant protein can be analyzed in the absence of the normal protein. As discussed above, this can be readily accomplished in some haploid, single-celled organisms. We shall see in the following section that much more complicated procedures have been developed that allow gene replacements of this type in mice. Foreign DNA can, however, be rather easily integrated into random positions of many animal genomes. In mammals, for example, linear DNA fragments introduced into cells are rapidly ligated end-to-end by intracellular enzymes to form long tandem arrays, which usually become integrated into a chromosome at an apparently random site. Fertilized mammalian eggs behave like other mammalian cells in this respect. A mouse egg injected with 200 copies of a linear DNA molecule often develops into a mouse containing, in many of its cells, a tandem array of copies of the injected gene integrated at a single random site in one of its chromosomes. If the modified chromosome is present in the germ line cells (eggs or sperm), the mouse will pass these foreign genes on to its progeny.

Animals that have been permanently reengineered by either gene insertion, gene deletion, or gene replacement are called transgenic organisms, and any foreign or modified genes that are added are called transgenes. When the normal gene remains present, only dominant effects of the alteration will show up in phenotypic analyses. Nevertheless, transgenic animals with inserted genes have provided important insights into how mammalian genes are regulated and how certain altered genes (called oncogenes) cause cancer.

It is also possible to produce transgenic fruit flies, in which single copies of a gene are inserted at random into the Drosophila genome. In this case the DNA fragment is first inserted between the two terminal sequences of a Drosophila transposon called the P element. The terminal sequences enable the P element to integrate into Drosophila chromosomes when the P element transposase enzyme is also present (see p. 288). To make transgenic fruit flies, therefore, the appropriately modified DNA fragment is injected into a very young fruit fly embryo along with a separate plasmid containing the gene encoding the transposase. When this is done, the injected gene often enters the germ line in a single copy as the result of a transposition event.

Gene Targeting Makes It Possible to Produce Transgenic Mice That Are Missing Specific Genes

If a DNA molecule carrying a mutated mouse gene is transferred into a mouse cell, it usually inserts into the chromosomes at random, but about once in a thousand times, it replaces one of the two copies of the normal gene by homologous recombination. By exploiting these rare “gene targeting” events, any specific gene can be altered or inactivated in a mouse cell by a direct gene replacement. In the special case in which the gene of interest is inactivated, the resulting animal is called a “knockout” mouse.

The technique works as follows: in the first step, a DNA fragment containing a desired mutant gene (or a DNA fragment designed to interrupt a target gene) is inserted into a vector and then introduced into a special line of embryo-derived mouse stem cells, called embryonic stem cells or ES cells, that grow in cell culture and are capable of producing cells of many different tissue types. After a period of cell proliferation, the rare colonies of cells in which a homologous recombination event is likely to have caused a gene replacement to occur are isolated. The correct colonies among these are identified by PCR or by Southern blotting: they contain recombinant DNA sequences in which the inserted fragment has replaced all or part of one copy of the normal gene. In the second step, individual cells from the identified colony are taken up into a fine micropipette and injected into an early mouse embryo. The transfected embryo-derived stem cells collaborate with the cells of the host embryo to produce a normal-looking mouse; large parts of this chimeric animal, including—in favorable cases—cells of the germ line, often derive from the artificially altered stem cells (Figure 8-70).

Figure 8-70. Summary of the procedures used for making gene replacements in mice.

Figure 8-70

Summary of the procedures used for making gene replacements in mice. In the first step (A), an altered version of the gene is introduced into cultured ES (embryonic stem) cells. Only a few rare ES cells will have their corresponding normal genes replaced (more...)

The mice with the transgene in their germ line are bred to produce both a male and a female animal, each heterozygous for the gene replacement (that is, they have one normal and one mutant copy of the gene). When these two mice are in turn mated, one-fourth of their progeny will be homozygous for the altered gene. Studies of these homozygotes allow the function of the altered gene—or the effects of eliminating a gene activity—to be examined in the absence of the corresponding normal gene.

The ability to prepare transgenic mice lacking a known normal gene has been a major advance, and the technique is now being used to dissect the functions of a large number of mammalian genes (Figure 8-71). Related techniques can be used to produce conditional mutants, in which a selected gene becomes disrupted in a specific tissue at a certain time in development. The strategy takes advantage of a site-specific recombination system to excise—and thus disable—the target gene in a particular place or at a particular time. The most common of these recombination systems called Cre/lox, is widely used to engineer gene replacements in mice and in plants (see Figure 5-82). In this case the target gene in ES cells is replaced by a fully functional version of the gene that is flanked by a pair of the short DNA sequences, called lox sites, that are recognized by the Cre recombinase protein. The transgenic mice that result are phenotypically normal. They are then mated with transgenic mice that express the Cre recombinase gene under the control of an inducible promoter. In the specific cells or tissues in which Cre is switched on, it catalyzes recombination between the lox sequences—excising a target gene and eliminating its activity. Similar recombination systems are used to generate conditional mutants in Drosophila (see Figure 21-48).

Figure 8-71. Mouse with an engineered defect in fibroblast growth factor 5 (FGF5).

Figure 8-71

Mouse with an engineered defect in fibroblast growth factor 5 (FGF5). FGF5 is a negative regulator of hair formation. In a mouse lacking FGF5 (right), the hair is long compared with its heterozygous littermate (left). Transgenic mice with phenotypes that (more...)

Transgenic Plants Are Important for Both Cell Biology and Agriculture

When a plant is damaged, it can often repair itself by a process in which mature differentiated cells “dedifferentiate,” proliferate, and then redifferentiate into other cell types. In some circumstances the dedifferentiated cells can even form an apical meristem, which can then give rise to an entire new plant, including gametes. This remarkable plasticity of plant cells can be exploited to generate transgenic plants from cells growing in culture.

When a piece of plant tissue is cultured in a sterile medium containing nutrients and appropriate growth regulators, many of the cells are stimulated to proliferate indefinitely in a disorganized manner, producing a mass of relatively undifferentiated cells called a callus. If the nutrients and growth regulators are carefully manipulated, one can induce the formation of a shoot and then root apical meristems within the callus, and, in many species, a whole new plant can be regenerated.

Callus cultures can also be mechanically dissociated into single cells, which will grow and divide as a suspension culture. In several plants—including tobacco, petunia, carrot, potato, and Arabidopsis—a single cell from such a suspension culture can be grown into a small clump (a clone) from which a whole plant can be regenerated. Such a cell, which has the ability to give rise to all parts of the organism, is considered totipotent. Just as mutant mice can be derived by genetic manipulation of embryonic stem cells in culture, so transgenic plants can be created from single totipotent plant cells transfected with DNA in culture (Figure 8-72).

Figure 8-72. A procedure used to make a transgenic plant.

Figure 8-72

A procedure used to make a transgenic plant. (A) Outline of the process. A disc is cut out of a leaf and incubated in culture with Agrobacteria that carry a recombinant plasmid with both a selectable marker and a desired transgene. The wounded cells at (more...)

The ability to produce transgenic plants has greatly accelerated progress in many areas of plant cell biology. It has had an important role, for example, in isolating receptors for growth regulators and in analyzing the mechanisms of morphogenesis and of gene expression in plants. It has also opened up many new possibilities in agriculture that could benefit both the farmer and the consumer. It has made it possible, for example, to modify the lipid, starch, and protein storage reserved in seeds, to impart pest and virus resistance to plants, and to create modified plants that tolerate extreme habitats such as salt marshes or water-stressed soil.

Many of the major advances in understanding animal development have come from studies on the fruit fly Drosophila and the nematode worm Caenorhabditis elegans, which are amenable to extensive genetic analysis as well as to experimental manipulation. Progress in plant developmental biology has, in the past, been relatively slow by comparison. Many of the plants that have proved most amenable to genetic analysis—such as maize and tomato—have long life cycles and very large genomes, making both classical and molecular genetic analysis time-consuming. Increasing attention is consequently being paid to a fast-growing small weed, the common wall cress (Arabidopsis thaliana), which has several major advantages as a “model plant” (see Figures 1-46 and 21-107). The relatively small Arabidopsis genome was the first plant genome to be completely sequenced.

Large Collections of Tagged Knockouts Provide a Tool for Examining the Function of Every Gene in an Organism

Extensive collaborative efforts are underway to generate comprehensive libraries of mutations in several model organisms, including S. cerevisiae, C. elegans, Drosophila, Arabidopsis, and the mouse. The ultimate aim in each case is to produce a collection of mutant strains in which every gene in the organism has either been systematically deleted, or altered such that it can be conditionally disrupted. Collections of this type will provide an invaluable tool for investigating gene function on a genomic scale. In some cases, each of the individual mutants within the collection will sport a distinct molecular tag—a unique DNA sequence designed to make identification of the altered gene rapid and routine.

In S. cerevisiae, the task of generating a set of 6000 mutants, each missing only one gene, is made simpler by yeast's propensity for homologous recombination. For each gene, a “deletion cassette” is prepared. The cassette consists of a special DNA molecule that contains 50 nucleotides identical in sequence to each end of the targeted gene, surrounding a selectable marker. In addition, a special “barcode” sequence tag is embedded in this DNA molecule to facilitate the later rapid identification of each resulting mutant strain (Figure 8-73). A large mixture of such gene knockout mutants can then be grown under various selective test conditions—such as nutritional deprivation, temperature shift, or the presence of various drugs—and the cells that survive can be rapidly identified by their unique sequence tags. By assessing how well each mutant in the mixture fares, one can begin to assess which genes are essential, useful, or irrelevant for growth under various conditions.

Figure 8-73. Making collections of mutant organisms.

Figure 8-73

Making collections of mutant organisms. (A) A deletion cassette for use in yeast contains sequences homologous to each end of a target gene x (red), a selectable marker (blue), and a unique “barcode” sequence, approximately 20 nucleotide (more...)

The challenge in deriving information from the study of such yeast mutants lies in deducing a gene's activity or biological role based on a mutant phenotype. Some defects—an inability to live without histidine, for example—point directly to the function of the wild-type gene. Other connections may not be so obvious. What might a sudden sensitivity to cold indicate about the role that a particular gene plays in the yeast cell? Such problems are even greater in organisms that are more complex than yeast. The loss of function of a single gene in the mouse, for example, can affect many different tissue types at different stages of development—whereas the loss of other genes is found to have no obvious effect. Adequately characterizing mutant phenotypes in mice often requires a thorough examination, along with extensive knowledge of mouse anatomy, histology, pathology, physiology, and complex behavior.

The insights generated by examination of mutant libraries, however, will be great. For example, studies of an extensive collection of mutants in Mycoplasma genitalium—the organism with the smallest known genome—have identified the minimum complement of genes essential for cellular life. Analysis of the mutant pool suggests that 265–350 of the 480 protein-coding genes in M. genitalium are required for growth under laboratory conditions. Approximately 100 of these essential genes are of unknown function, which suggests that a surprising number of the basic molecular mechanisms that underlie cellular life have yet to be discovered.


Genetics and genetic engineering provide powerful tools for the study of gene function in both cells and organisms. In the classical genetic approach, random mutagenesis is coupled with screening to identify mutants that are deficient in a particular biological process. These mutants are then used to locate and study the genes responsible for that process.

Gene function can also be ascertained by reverse genetic techniques. DNA engineering methods can be used to mutate any gene and to re-insert it into a cell's chromosomes so that it becomes a permanent part of the genome. If the cell used for this gene transfer is a fertilized egg (for an animal) or a totipotent plant cell in culture, transgenic organisms can be produced that express the mutant gene and pass it on to their progeny. Especially important for cell biology is the ability to alter cells and organisms in highly specific ways—allowing one to discern the effect on the cell or the organism of a designed change in a single protein or RNA molecule.

Many of these methods are being expanded to investigate gene function on a genome-wide scale. Technologies such as DNA microarrays can be used to monitor the expression of thousands of genes simultaneously, providing detailed, comprehensive snapshots of the dynamic patterns of gene expression that underlie complex cellular processes. And the generation of mutant libraries in which every gene in an organism has been systematically deleted or disrupted will provide an invaluable tool for exploring the role of each gene in the elaborate molecular collaboration that gives rise to life.

By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.

Copyright © 2002, Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter; Copyright © 1983, 1989, 1994, Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson .
Bookshelf ID: NBK26818


  • Cite this Page
  • Disable Glossary Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...