NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Alberts B, Johnson A, Lewis J, et al. Molecular Biology of the Cell. 4th edition. New York: Garland Science; 2002.

  • By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.
Cover of Molecular Biology of the Cell

Molecular Biology of the Cell. 4th edition.

Show details

DNA-Binding Motifs in Gene Regulatory Proteins

How does a cell determine which of its thousands of genes to transcribe? As mentioned briefly in Chapters 4 and 6, the transcription of each gene is controlled by a regulatory region of DNA relatively near the site where transcription begins. Some regulatory regions are simple and act as switches that are thrown by a single signal. Many others are complex and act as tiny microprocessors, responding to a variety of signals that they interpret and integrate to switch the neighboring gene on or off. Whether complex or simple, these switching devices contain two types of fundamental components: (1) short stretches of DNA of defined sequence and (2) gene regulatory proteins that recognize and bind to them.

We begin our discussion of gene regulatory proteins by describing how these proteins were discovered.

Gene Regulatory Proteins Were Discovered Using Bacterial Genetics

Genetic analyses in bacteria carried out in the 1950s provided the first evidence for the existence of gene regulatory proteins that turn specific sets of genes on or off. One of these regulators, the lambda repressor, is encoded by a bacterial virus, bacteriophage lambda. The repressor shuts off the viral genes that code for the protein components of new virus particles and thereby enables the viral genome to remain a silent passenger in the bacterial chromosome, multiplying with the bacterium when conditions are favorable for bacterial growth (see Figure 5-81). The lambda repressor was among the first gene regulatory proteins to be characterized, and it remains one of the best understood, as we discuss later. Other bacterial regulators respond to nutritional conditions by shutting off genes encoding specific sets of metabolic enzymes when they are not needed. The lac repressor, the first of these bacterial proteins to be recognized, turns off the production of the proteins responsible for lactose metabolism when this sugar is absent from the medium.

The first step toward understanding gene regulation was the isolation of mutant strains of bacteria and bacteriophage lambda that were unable to shut off specific sets of genes. It was proposed at the time, and later proven, that most of these mutants were deficient in proteins acting as specific repressors for these sets of genes. Because these proteins, like most gene regulatory proteins, are present in small quantities, it was difficult and time-consuming to isolate them. They were eventually purified by fractionating cell extracts. Once isolated, the proteins were shown to bind to specific DNA sequences close to the genes that they regulate. The precise DNA sequences that they recognized were then determined by a combination of classical genetics, DNA sequencing, and DNA-footprinting experiments (discussed in Chapter 8).

The Outside of the DNA Helix Can Be Read by Proteins

As discussed in Chapter 4, the DNA in a chromosome consists of a very long double helix (Figure 7-6). Gene regulatory proteins must recognize specific nucleotide sequences embedded within this structure. It was originally thought that these proteins might require direct access to the hydrogen bonds between base pairs in the interior of the double helix to distinguish between one DNA sequence and another. It is now clear, however, that the outside of the double helix is studded with DNA sequence information that gene regulatory proteins can recognize without having to open the double helix. The edge of each base pair is exposed at the surface of the double helix, presenting a distinctive pattern of hydrogen bond donors, hydrogen bond acceptors, and hydrophobic patches for proteins to recognize in both the major and minor groove (Figure 7-7). But only in the major groove are the patterns markedly different for each of the four base-pair arrangements (Figure 7-8). For this reason, gene regulatory proteins generally bind to the major groove—as we shall see.

Figure 7-6. Double-helical structure of DNA.

Figure 7-6

Double-helical structure of DNA. The major and minor grooves on the outside of the double helix are indicated. The atoms are colored as follows: carbon, dark blue; nitrogen, light blue; hydrogen, white; oxygen, red; phosphorus, yellow.

Figure 7-7. How the different base pairs in DNA can be recognized from their edges without the need to open the double helix.

Figure 7-7

How the different base pairs in DNA can be recognized from their edges without the need to open the double helix. The four possible configurations of base pairs are shown, with potential hydrogen bond donors indicated in blue, potential hydrogen bond (more...)

Figure 7-8. A DNA recognition code.

Figure 7-8

A DNA recognition code. The edge of each base pair, seen here looking directly at the major or minor groove, contains a distinctive pattern of hydrogen bond donors, hydrogen bond acceptors, and methyl groups. From the major groove, each of the four base-pair (more...)

Although the patterns of hydrogen bond donor and acceptor groups are the most important features recognized by gene regulatory proteins, they are not the only ones: the nucleotide sequence also determines the overall geometry of the double helix, creating distortions of the “idealized” helix that can also be recognized.

The Geometry of the DNA Double Helix Depends on the Nucleotide Sequence

For 20 years after the discovery of the DNA double helix in 1953, DNA was thought to have the same monotonous structure, with exactly 36° of helical twist between its adjacent nucleotide pairs (10 nucleotide pairs per helical turn) and a uniform helix geometry. This view was based on structural studies of heterogeneous mixtures of DNA molecules, however, and it changed once the three-dimensional structures of short DNA molecules of defined nucleotide sequence were determined using x-ray crystallography and NMR spectroscopy. Whereas the earlier studies provided a picture of an average, idealized DNA molecule, the later studies showed that any given nucleotide sequence had local irregularities, such as tilted nucleotide pairs or a helical twist angle larger or smaller than 36°. These unique features can be recognized by specific DNA-binding proteins.

An especially striking departure from the average structure occurs in nucleotide sequences that cause the DNA double helix to bend. Some sequences (for example, AAAANNN, where N can be any base except A) form a double helix with a pronounced irregularity that causes a slight bend; if this sequence is repeated at 10-nucleotide-pair intervals in a long DNA molecule, the small bends add together so that the DNA molecule appears unusually curved when viewed in the electron microscope (Figure 7-9).

Figure 7-9. Electron micrograph of fragments of a highly bent segment of DNA double helix.

Figure 7-9

Electron micrograph of fragments of a highly bent segment of DNA double helix. The DNA fragments are derived from the small, circular mitochondrial DNA molecules of a trypanosome. Although the fragments are only about 200 nucleotide pairs long, many of (more...)

A related and equally important variable feature of DNA structure is the extent to which the double helix is deformable. For a protein to recognize and bind to a specific DNA sequence, there must be a tight fit between the DNA and the protein, and often the normal DNA conformation must be distorted to maximize this fit (Figure 7-10). The energetic cost of such distortion depends on the local nucleotide sequence. We encountered an example of this in the discussion of nucleosome assembly in Chapter 4: some DNA sequences can accommodate the tight DNA wrapping required for nucleosome formation better than others. Similarly, a few gene regulatory proteins induce a striking bend in the DNA when they bind to it (Figure 7-11). In general, these proteins recognize DNA sequences that are easily bent.

Figure 7-10. DNA deformation induced by protein binding.

Figure 7-10

DNA deformation induced by protein binding. The figure shows the changes of DNA structure, from the conventional double-helix (A) to a distorted form (B) observed when a well-studied gene regulatory protein (the bacteriophage 434 repressor, a close relative (more...)

Figure 7-11. The bending of DNA induced by the binding of the catabolite activator protein (CAP).

Figure 7-11

The bending of DNA induced by the binding of the catabolite activator protein (CAP). CAP is a gene regulatory protein from E. coli. In the absence of the bound protein, this DNA helix is straight.

Short DNA Sequences Are Fundamental Components of Genetic Switches

We have seen how a specific nucleotide sequence can be detected as a pattern of structural features on the surface of the DNA double helix. Particular nucleotide sequences, each typically less than 20 nucleotide pairs in length, function as fundamental components of genetic switches by serving as recognition sites for the binding of specific gene regulatory proteins. Thousands of such DNA sequences have been identified, each recognized by a different gene regulatory protein (or by a set of related gene regulatory proteins). Some of the gene regulatory proteins that are discussed in the course of this chapter are listed in Table 7-1, along with the DNA sequences that they recognize.

Table 7-1. Some Gene Regulatory Proteins and the DNA Sequences That They Recognize.

Table 7-1

Some Gene Regulatory Proteins and the DNA Sequences That They Recognize.

We now turn to the gene regulatory proteins themselves, the second fundamental component of genetic switches. We begin with the structural features that allows these proteins to recognize short, specific DNA sequences contained in a much longer double helix.

Gene Regulatory Proteins Contain Structural Motifs That Can Read DNA Sequences

Molecular recognition in biology generally relies on an exact fit between the surfaces of two molecules, and the study of gene regulatory proteins has provided some of the clearest examples of this principle. A gene regulatory protein recognizes a specific DNA sequence because the surface of the protein is extensively complementary to the special surface features of the double helix in that region. In most cases the protein makes a large number of contacts with the DNA, involving hydrogen bonds, ionic bonds, and hydrophobic interactions. Although each individual contact is weak, the 20 or so contacts that are typically formed at the protein-DNA interface add together to ensure that the interaction is both highly specific and very strong (Figure 7-12). In fact, DNA-protein interactions include some of the tightest and most specific molecular interactions known in biology.

Figure 7-12. The binding of a gene regulatory protein to the major groove of DNA.

Figure 7-12

The binding of a gene regulatory protein to the major groove of DNA. Only a single contact is shown. Typically, the protein-DNA interface would consist of 10 to 20 such contacts, involving different amino acids, each contributing to the strength of the (more...)

Although each example of protein-DNA recognition is unique in detail, x-ray crystallographic and NMR spectroscopic studies of several hundred gene regulatory proteins have revealed that many of the proteins contain one or another of a small set of DNA-binding structural motifs. These motifs generally use either α helices or β sheets to bind to the major groove of DNA; this groove, as we have seen, contains sufficient information to distinguish one DNA sequence from any other. The fit is so good that it has been suggested that the dimensions of the basic structural units of nucleic acids and proteins evolved together to permit these molecules to interlock.

The Helix-Turn-Helix Motif Is One of the Simplest and Most Common DNA-binding Motifs

The first DNA-binding protein motif to be recognized was the helix-turn-helix. Originally identified in bacterial proteins, this motif has since been found in hundreds of DNA-binding proteins from both eucaryotes and procaryotes. It is constructed from two α helices connected by a short extended chain of amino acids, which constitutes the “turn” (Figure 7-13). The two helices are held at a fixed angle, primarily through interactions between the two helices. The more C-terminal helix is called the recognition helix because it fits into the major groove of DNA; its amino acid side chains, which differ from protein to protein, play an important part in recognizing the specific DNA sequence to which the protein binds.

Figure 7-13. The DNA-binding helix-turn-helix motif.

Figure 7-13

The DNA-binding helix-turn-helix motif. The motif is shown in (A), where each white circle denotes the central carbon of an amino acid. The C-terminal α helix (red) is called the recognition helix because it participates in sequence-specific recognition (more...)

Outside the helix-turn-helix region, the structure of the various proteins that contain this motif can vary enormously (Figure 7-14). Thus each protein “presents” its helix-turn-helix motif to the DNA in a unique way, a feature thought to enhance the versatility of the helix-turn-helix motif by increasing the number of DNA sequences that the motif can be used to recognize. Moreover, in most of these proteins, parts of the polypeptide chain outside the helix-turn-helix domain also make important contacts with the DNA, helping to fine-tune the interaction.

Figure 7-14. Some helix-turn-helix DNA-binding proteins.

Figure 7-14

Some helix-turn-helix DNA-binding proteins. All of the proteins bind DNA as dimers in which the two copies of the recognition helix (red cylinder) are separated by exactly one turn of the DNA helix (3.4 nm). The other helix of the helix-turn-helix motif (more...)

The group of helix-turn-helix proteins shown in Figure 7-14 demonstrates a feature that is common to many sequence-specific DNA-binding proteins. They bind as symmetric dimers to DNA sequences that are composed of two very similar “half-sites,” which are also arranged symmetrically (Figure 7-15). This arrangement allows each protein monomer to make a nearly identical set of contacts and enormously increases the binding affinity: as a first approximation, doubling the number of contacts doubles the free energy of the interaction and thereby squares the affinity constant.

Figure 7-15. A specific DNA sequence recognized by the bacteriophage lambda Cro protein.

Figure 7-15

A specific DNA sequence recognized by the bacteriophage lambda Cro protein. The nucleotides labeled in green in this sequence are arranged symmetrically, allowing each half of the DNA site to be recognized in the same way by each protein monomer, also (more...)

Homeodomain Proteins Constitute a Special Class of Helix-Turn-Helix Proteins

Not long after the first gene regulatory proteins were discovered in bacteria, genetic analyses in the fruit fly Drosophila led to the characterization of an important class of genes, the homeotic selector genes, that play a critical part in orchestrating fly development. As discussed in Chapter 21, they have since proved to have a fundamental role in the development of higher animals as well. Mutations in these genes cause one body part in the fly to be converted into another, showing that the proteins they encode control critical developmental decisions.

When the nucleotide sequences of several homeotic selector genes were determined in the early 1980s, each proved to contain an almost identical stretch of 60 amino acids that defines this class of proteins and is termed the homeodomain. When the three-dimensional structure of the homeodomain was determined, it was seen to contain a helix-turn-helix motif related to that of the bacterial gene regulatory proteins, providing one of the first indications that the principles of gene regulation established in bacteria are relevant to higher organisms as well. More than 60 homeodomain proteins have now been discovered in Drosophila alone, and homeodomain proteins have been identified in virtually all eucaryotic organisms that have been studied, from yeasts to plants to humans.

The structure of a homeodomain bound to its specific DNA sequence is shown in Figure 7-16. Whereas the helix-turn-helix motif of bacterial gene regulatory proteins is often embedded in different structural contexts, the helix-turn-helix motif of homeodomains is always surrounded by the same structure (which forms the rest of the homeodomain), suggesting that the motif is always presented to DNA in the same way. Indeed, structural studies have shown that a yeast homeodomain protein and a Drosophila homeodomain protein have very similar conformations and recognize DNA in almost exactly the same manner, although they are identical at only 17 of 60 amino acid positions (see Figure 3-15).

Figure 7-16. A homeodomain bound to its specific DNA sequence.

Figure 7-16

A homeodomain bound to its specific DNA sequence. Two different views of the same structure are shown. (A) The homeodomain is folded into three α helices, which are packed tightly together by hydrophobic interactions. The part containing helix (more...)

There Are Several Types of DNA-binding Zinc Finger Motifs

The helix-turn-helix motif is composed solely of amino acids. A second important group of DNA-binding motifs adds one or more zinc atoms as structural components. Although all such zinc-coordinated DNA-binding motifs are called zinc fingers, this description refers only to their appearance in schematic drawings dating from their initial discovery (Figure 7-17A). Subsequent structural studies have shown that they fall into several distinct structural groups, two of which are considered here. The first type was initially discovered in the protein that activates the transcription of a eucaryotic ribosomal RNA gene. It is a simple structure, consisting of an α helix and a β sheet held together by the zinc (Figure 7-17B). This type of zinc finger is often found in a cluster with additional zinc fingers, arranged one after the other so that the α helix of each can contact the major groove of the DNA, forming a nearly continuous stretch of α helices along the groove. In this way, a strong and specific DNA-protein interaction is built up through a repeating basic structural unit (Figure 7-18). A particular advantage of this motif is that the strength and specificity of the DNA-protein interaction can be adjusted during evolution by changes in the number of zinc finger repeats. By contrast, it is difficult to imagine how any of the other DNA-binding motifs discussed in this chapter could be formed into repeating chains.

Figure 7-17. One type of zinc finger protein.

Figure 7-17

One type of zinc finger protein. This protein belongs to the Cys-Cys-His-His family of zinc finger proteins, named after the amino acids that grasp the zinc. (A) Schematic drawing of the amino acid sequence of a zinc finger from a frog protein of this (more...)

Figure 7-18. DNA binding by a zinc finger protein.

Figure 7-18

DNA binding by a zinc finger protein. (A) The structure of a fragment of a mouse gene regulatory protein bound to a specific DNA site. This protein recognizes DNA using three zinc fingers of the Cys-Cys-His-His type (see Figure 7-17) arranged as direct (more...)

Another type of zinc finger is found in the large family of intracellular receptor proteins (discussed in detail in Chapter 15). It forms a different type of structure (similar in some respects to the helix-turn-helix motif) in which two α helices are packed together with zinc atoms (Figure 7-19). Like the helix-turn- helix proteins, these proteins usually form dimers that allow one of the two α helices of each subunit to interact with the major groove of the DNA (see Figure 7-14). Although the two types of zinc finger structures discussed in this section are structurally distinct, they share two important features: both use zinc as a structural element, and both use an α helix to recognize the major groove of the DNA.

Figure 7-19. A dimer of the zinc finger domain of the intracellular receptor family bound to its specific DNA sequence.

Figure 7-19

A dimer of the zinc finger domain of the intracellular receptor family bound to its specific DNA sequence. Each zinc finger domain contains two atoms of Zn (indicated by the small gray spheres); one stabilizes the DNA recognition helix (shown in brown (more...)

β sheets Can Also Recognize DNA

In the DNA-binding motifs discussed so far, α helices are the primary mechanism used to recognize specific DNA sequences. One group of gene regulatory proteins, however, has evolved an entirely different and no less ingenious recognition strategy. In this case the information on the surface of the major groove is read by a two-stranded β sheet, with side chains of the amino acids extending from the sheet toward the DNA as shown in Figure 7-20. As in the case of a recognition α helix, this β-sheet motif can be used to recognize many different DNA sequences; the exact DNA sequence recognized depends on the sequence of amino acids that make up the β sheet.

Figure 7-20. The bacterial met repressor protein.

Figure 7-20

The bacterial met repressor protein. The bacterial met repressor regulates the genes encoding the enzymes that catalyze methionine synthesis. When this amino acid is abundant, it binds to the repressor, causing a change in the structure of the protein (more...)

The Leucine Zipper Motif Mediates Both DNA Binding and Protein Dimerization

Many gene regulatory proteins recognize DNA as homodimers, probably because, as we have seen, this is a simple way of achieving strong specific binding (see Figure 7-15). Usually, the portion of the protein responsible for dimerization is distinct from the portion that is responsible for DNA binding (see Figure 7-14). One motif, however, combines these two functions in an elegant and economical way. It is called the leucine zipper motif, so named because of the way the two α helices, one from each monomer, are joined together to form a short coiled-coil (see Figure 3-11). The helices are held together by interactions between hydrophobic amino acid side chains (often on leucines) that extend from one side of each helix. Just beyond the dimerization interface the two α helices separate from each other to form a Y-shaped structure, which allows their side chains to contact the major groove of DNA. The dimer thus grips the double helix like a clothespin on a clothesline (Figure 7-21).

Figure 7-21. A leucine zipper dimer bound to DNA.

Figure 7-21

A leucine zipper dimer bound to DNA. Two α-helical DNA-binding domains (bottom) dimerize through their α-helical leucine zipper region (top) to form an inverted Y-shaped structure. Each arm of the Y is formed by a single α helix, (more...)

Heterodimerization Expands the Repertoire of DNA Sequences Recognized by Gene Regulatory Proteins

Many of the gene regulatory proteins we have seen thus far bind DNA as homo-dimers, that is, dimers made up of two identical subunits. However, many gene regulatory proteins, including leucine zipper proteins, can also associate with nonidentical partners to form heterodimers composed of two different subunits. Because heterodimers typically form from two proteins with distinct DNA-binding specificities, the mixing and matching of gene regulatory proteins to form heterodimers greatly expands the repertoire of DNA-binding specificities that these proteins can display. As illustrated in Figure 7-22, three distinct DNA-binding specificities could, in principle, be generated from two types of leucine zipper monomer, while six could be created from three types of monomer, and so on.

Figure 7-22. Heterodimerization of leucine zipper proteins can alter their DNA-binding specificity.

Figure 7-22

Heterodimerization of leucine zipper proteins can alter their DNA-binding specificity. Leucine zipper homodimers bind to symmetric DNA sequences, as shown in the left-hand and center drawings. These two proteins recognize different DNA sequences, as indicated (more...)

There are, however, limits to this promiscuity: if all the many types of leucine zipper proteins in a typical eucaryotic cell formed heterodimers, the amount of “cross-talk” between the gene regulatory circuits of a cell would be so great as to cause chaos. Whether or not a particular heterodimer can form depends on how well the hydrophobic surfaces of the two leucine zipper α helices mesh with each other, which, in turn, depends on the exact amino acid sequences of the two zipper regions. Thus each leucine zipper protein in the cell can form dimers with only a small set of other leucine zipper proteins.

Heterodimerization is an example of combinatorial control, in which combinations of different proteins, rather than individual proteins, control a cellular process. Heterodimerization is one of the mechanisms used by eucaryotic cells to control gene expression in this way, and it occurs in a wide variety of different types of gene regulatory proteins (Figure 7-23). As we discuss later, however, the formation of heterodimeric gene regulatory complexes is only one of several combinatorial mechanisms for controlling gene expression.

Figure 7-23. A heterodimer composed of two homeodomain proteins bound to its DNA recognition site.

Figure 7-23

A heterodimer composed of two homeodomain proteins bound to its DNA recognition site. The yellow helix 4 of the protein on the right (Matα2) is unstructured in the absence of the protein on the left (Mata1), forming a helix only upon heterodimerization. (more...)

During the evolution of gene regulatory proteins, similar combinatorial principles have produced new DNA-binding specificities by joining two distinct DNA-binding domains into a single polypeptide chain (Figure 7-24).

Figure 7-24. Two DNA-binding domains covalently joined by a flexible polypeptide.

Figure 7-24

Two DNA-binding domains covalently joined by a flexible polypeptide. The structure shown (called a POU-domain) consists of both a homeodomain and a helix-turn-helix structure (closely related to the bacteriophage λ repressor—see Figure (more...)

The Helix-Loop-Helix Motif Also Mediates Dimerization and DNA Binding

Another important DNA-binding motif, related to the leucine zipper, is the helix-loop-helix (HLH) motif, which should not be confused with the helix-turn-helix motif discussed earlier. An HLH motif consists of a short α helix connected by a loop to a second, longer α helix. The flexibility of the loop allows one helix to fold back and pack against the other. As shown in Figure 7-25, this two-helix structure binds both to DNA and to the HLH motif of a second HLH protein. As with leucine zipper proteins, the second HLH protein can be the same (creating a homodimer) or different (creating a heterodimer). In either case, two α helices that extend from the dimerization interface make specific contacts with the DNA.

Figure 7-25. A helix-loop-helix dimer bound to DNA.

Figure 7-25

A helix-loop-helix dimer bound to DNA. The two monomers are held together in a four-helix bundle: each monomer contributes two α helices connected by a flexible loop of protein (red). A specific DNA sequence is bound by the two α helices (more...)

Several HLH proteins lack the α-helical extension responsible for binding to DNA. These truncated proteins can form heterodimers with full-length HLH proteins, but the heterodimers are unable to bind DNA tightly because they form only half of the necessary contacts. Thus, in addition to creating active dimers, heterodimerization provides a way to hold specific gene regulatory proteins in check (Figure 7-26).

Figure 7-26. Inhibitory regulation by truncated HLH proteins.

Figure 7-26

Inhibitory regulation by truncated HLH proteins. The HLH motif is responsible for both dimerization and DNA binding. On the left, an HLH homodimer recognizes a symmetric DNA sequence. On the right, the binding of a full-length HLH protein (blue) to a (more...)

It Is Not Yet Possible to Accurately Predict the DNA Sequences Recognized by All Gene Regulatory Proteins

The various DNA-binding motifs that we have discussed provide structural frameworks from which specific amino acid side chains extend to contact specific base pairs in the DNA. It is reasonable to ask, therefore, whether there is a simple amino acid-base pair recognition code: is a G-C base pair, for example, always contacted by a particular amino acid side chain? The answer appears to be no, although certain types of amino acid-base interactions appear much more frequently than others (Figure 7-27). As we saw in Chapter 3, protein surfaces of virtually any shape and chemistry can be made from just 20 different amino acids, and a gene regulatory protein uses different combinations of these to create a surface that is precisely complementary to a particular DNA sequence. We know that the same base pair can thereby be recognized in many ways depending on its context (Figure 7-28). Nevertheless, molecular biologists are beginning to understand protein-DNA recognition well enough that we should soon be able to design proteins that will recognize any desired DNA sequence.

Figure 7-27. One of the most common protein-DNA interactions.

Figure 7-27

One of the most common protein-DNA interactions. Because of its specific geometry of hydrogen-bond acceptors (see Figure 7-7), guanine can be unambiguously recognized by the side chain of arginine. Another common protein-DNA interaction was shown in Figure (more...)

Figure 7-28. Summary of sequence-specific interactions between different six zinc fingers and their DNA recognition sequences.

Figure 7-28

Summary of sequence-specific interactions between different six zinc fingers and their DNA recognition sequences. Even though all six Zn fingers have the same overall structure (see Figure 7-17), each binds to a different DNA sequence. The numbered amino (more...)

A Gel-Mobility Shift Assay Allows Sequence-specific DNA-binding Proteins to Be Detected Readily

Genetic analyses, which provided a route to the gene regulatory proteins of bacteria, yeast, and Drosophila, is much more difficult in vertebrates. Therefore, the isolation of vertebrate gene regulatory proteins had to await the development of different approaches. Many of these approaches rely on the detection in a cell extract of a DNA-binding protein that specifically recognizes a DNA sequence known to control the expression of a particular gene. The most common way to detect sequence-specific DNA-binding proteins is to use a technique that is based on the effect of a bound protein on the migration of DNA molecules in an electric field.

A DNA molecule is highly negatively charged and will therefore move rapidly toward a positive electrode when it is subjected to an electric field. When analyzed by polyacrylamide-gel electrophoresis, DNA molecules are separated according to their size because smaller molecules are able to penetrate the fine gel meshwork more easily than large ones. Protein molecules bound to a DNA molecule will cause it to move more slowly through the gel; in general, the larger the bound protein, the greater the retardation of the DNA molecule. This phenomenon provides the basis for the gel-mobility shift assay, which allows even trace amounts of a sequence-specific DNA-binding protein to be readily detected. In this assay, a short DNA fragment of specific length and sequence (produced either by DNA cloning or by chemical synthesis) is radioactively labeled and mixed with a cell extract; the mixture is then loaded onto a polyacrylamide gel and subjected to electrophoresis. If the DNA fragment corresponds to a chromosomal region where, for example, several sequence-specific proteins bind, autoradiography will reveal a series of DNA bands, each retarded to a different extent and representing a distinct DNA-protein complex. The proteins responsible for each band on the gel can then be separated from one another by subsequent fractionations of the cell extract (Figure 7-29).

Figure 7-29. A gel-mobility shift assay.

Figure 7-29

A gel-mobility shift assay. The principle of the assay is shown schematically in (A). In this example an extract of an antibody-producing cell line is mixed with a radioactive DNA fragment containing about 160 nucleotides of a regulatory DNA sequence (more...)

DNA Affinity Chromatography Facilitates the Purification of Sequence-specific DNA-binding Proteins

A particularly powerful purification method called DNA affinity chromatography can be used once the DNA sequence that a gene regulatory protein recognizes has been determined. A double-stranded oligonucleotide of the correct sequence is synthesized by chemical methods and linked to an insoluble porous matrix such as agarose; the matrix with the oligonucleotide attached is then used to construct a column that selectively binds proteins that recognize the particular DNA sequence (Figure 7-30). Purifications as great as 10,000-fold can be achieved by this means with relatively little effort.

Figure 7-30. DNA affinity chromatography.

Figure 7-30

DNA affinity chromatography. In the first step, all the proteins that can bind DNA are separated from the remainder of the cellular proteins on a column containing a huge number of different DNA sequences. Most sequence-specific DNA-binding proteins have (more...)

Although most proteins that bind to a specific DNA sequence are present in a few thousand copies per higher eucaryotic cell (and generally represent only about one part in 50,000 of the total cell protein), enough pure protein can usually be isolated by affinity chromatography to obtain a partial amino acid sequence by mass spectrometry or other means (discussed in Chapter 8). If the complete genome sequence of the organism is known, the partial amino acid sequence can be used to identify the gene. The gene provides the complete amino acid sequence of the protein, and any uncertainties regarding exon and intron boundaries can be resolved by analyzing the mRNA produced by the gene, as described in Chapter 8. The gene also provides the means to produce the protein in unlimited amounts through genetic engineering techniques, as discussed in Chapter 8).

The DNA Sequence Recognized by a Gene Regulatory Protein Can Be Determined

Some gene regulatory proteins were discovered before the DNA sequence to which they bound was known. For example, many of the Drosophila homeo-domain proteins were discovered through the isolation of mutations that altered fly development. This allowed the genes encoding the proteins to be identified, and the proteins could then be over-expressed in cultured cells and easily purified. One method of determining the DNA sequences recognized by a gene regulatory protein is to use the purified protein to select out from a large pool of short nucleotides of differing sequence only those that bind tightly to it. After several rounds of selection, the nucleotide sequences of the tightly bound DNAs can be determined, and a consensus DNA recognition sequence for the gene regulatory protein can be formulated (Figure 7-31). The consensus sequence can be used to search genome sequences by computer and thereby identify candidate genes whose transcription might be regulated by the gene regulatory protein of interest. However, this strategy is not foolproof. For example, many organisms produce a set of closely related gene regulatory proteins that recognize very similar DNA sequences, and this approach cannot resolve them. In most cases, predictions of the sites of action of gene regulatory proteins obtained from searching genome sequences must be tested by more direct approaches, such as the one described in the next section.

Figure 7-31. A method for determining the DNA sequence recognized by a gene regulatory protein.

Figure 7-31

A method for determining the DNA sequence recognized by a gene regulatory protein. A purified gene regulatory protein is mixed with millions of different short DNA fragments, each with a different sequence of nucleotides. A collection of such DNA fragments (more...)

A Chromatin Immunoprecipitation Technique Identifies DNA Sites Occupied by Gene Regulatory Proteins in Living Cells

In general, a given gene regulatory protein does not occupy all its potential DNA-binding sites in the genome all the time. Under some conditions, the protein may simply not be synthesized, and so be absent from the cell; or, for example, it may be present but may have to form a heterodimer with another protein to bind DNA efficiently in a living cell; or it may be excluded from the nucleus until an appropriate signal is received from the cell's environment. One method for empirically determining the sites on DNA occupied by a given gene regulatory protein under a particular set of conditions is called chromatin immunoprecipitation (Figure 7-32). Proteins are covalently cross-linked to DNA in living cells, the cells are lysed, and the DNA is mechanically broken into small fragments. Then, antibodies directed against a given gene regulatory protein are used to purify DNA that was covalently cross-linked to the gene regulatory protein due to the protein's close proximity to that DNA at the time of cross-linking. In this way, the DNA sites occupied by the gene regulatory protein in the original cells can be determined.

Figure 7-32. Chromatin immunoprecipitation.

Figure 7-32

Chromatin immunoprecipitation. This methodology allows the identification of the sites in a genome that are occupied in vivo by a gene regulatory protein. The amplification of DNA by the polymerase chain reaction (PCR) is described in Chapter 8. The identities (more...)

This method is also routinely used to identify the positions along a genome that are packaged by the various types of modified histones (see Figure 4-35). In this case, antibodies specific to a particular histone modification are employed.


Gene regulatory proteins recognize short stretches of double-helical DNA of defined sequence and thereby determine which of the thousands of genes in a cell will be transcribed. Thousands of gene regulatory proteins have been identified in a wide variety of organisms. Although each of these proteins has unique features, most bind to DNA as homodimers or heterodimers and recognize DNA through one of a small number of structural motifs. The common motifs include the helix-turn-helix, the homeodomain, the leucine zipper, the helix-loop-helix, and zinc fingers of several types. The precise amino acid sequence that is folded into a motif determines the particular DNA sequence that is recognized. Heterodimerization increases the range of DNA sequences that can be recognized by gene regulatory proteins. Powerful techniques are available that make use of the DNA-sequence specificity of gene regulatory proteins to identify and isolate these proteins, the genes that encode them, the DNA sequences they recognize, and the genes that they regulate.

Image ch5f81
Image ch6f12
Image ch3f15
Image ch3f11
Image ch7f65
Image ch4f35

By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.

Copyright © 2002, Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter; Copyright © 1983, 1989, 1994, Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson .
Bookshelf ID: NBK26806


  • Cite this Page
  • Disable Glossary Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...