NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Alberts B, Bray D, Lewis J, et al. Molecular Biology of the Cell. 3rd edition. New York: Garland Science; 1994.

  • By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.
Cover of Molecular Biology of the Cell

Molecular Biology of the Cell. 3rd edition.

Show details

Chromosomal DNA and Its Packaging 1


For the first 40 years of this century, biologists tended to dismiss the possibility that DNA could carry the genetic information in chromosomes, partly because nucleic acids were erroneously believed to contain only a simple repeating tetranucleotide sequence (such as AGCTAGCTAGCT). We now know, however, that a DNA molecule is an enormously long, unbranched, linear polymer that can contain many millions of nucleotides arranged in an irregular but nonrandom sequence and that the genetic information of a cell is contained in the linear order of the nucleotides in its DNA. The genetic code, written in "words" of three nucleotides ( codons that each specify an amino acid, discussed in Chapter 6), neatly solves the problem of storing a large amount of genetic information in a small amount of space: every million "letters" (nucleotides) take up a linear distance of only 3.4 × 105 nm (0.034 cm) and occupy a total volume of about 106 nm3 (10-15cm3).

Each DNA molecule is packaged in a separate chromosome, and the total genetic information stored in the chromosomes of an organism is said to constitute its genome. The genome of the E. coli bacterium contains 4.7 × 106 nucleotide pairs of DNA, present in a single double-helical DNA molecule (one chromosome). The human genome, in contrast, contains about 3 x 109 nucleotide pairs, organized as 24 chromosomes (22 different autosomes and 2 different sex chromosomes), and thus consists of 24 different DNA molecules - each containing from 50 × 106 to 250 x 106 nucleotide pairs of DNA. DNA molecules of this size are 1.7 to 8.5 cm long when uncoiled, and even the slightest mechanical force will break them once the chromosomal proteins have been removed.

In diploid organisms such as ourselves, there are two copies of each different chromosome, one inherited from the mother and one from the father (except for the sex chromosomes in males, where a Y chromosome is inherited from the father and an X from the mother). A typical human cell thus contains a total of 46 chromosomes and about 6 × 109 nucleotide pairs of DNA. Other mammals have genomes of similar size. This amount of DNA could in theory be packed into a cube 1.9 microns on each side. By comparison, 6 x 109 letters in this book would occupy more than a million pages, thus requiring more than 1017 times as much space.

In this section we consider the relationship between DNA molecules, genes, and chromosomes, and we discuss how the DNA is folded into a compact and orderly structure - the chromosome - while still allowing access to its genetic information. Throughout the discussion, it is important to bear in mind that the chromosomes in a cell change their structure and activities according to the stage of the cell-division cycle: in mitosis, or M phase, they are very highly condensed and transcriptionally inactive; in the other, much longer part of the division cycle, called interphase, they are less condensed and are continuously active in directing RNA synthesis.

Each DNA Molecule That Forms a Linear Chromosome Must Contain a Centromere, Two Telomeres, and Replication Origins 2

To form a functional chromosome, a DNA molecule must be able to do more than direct the synthesis of RNA: it must be able to propagate itself reliably from one cell generation to the next. This requires three types of specialized nucleotide sequences in the DNA, each of which serves to attach specific proteins that guide the machinery that replicates and segregates chromosomes. Experiments in yeasts, whose chromosomes are relatively small and easy to manipulate by recombinant DNA methods, have identified the minimal DNA sequence elements responsible for each of these functions. Two of the three elements were identified by studying small circular DNA molecules that can be propagated as plasmids in cells of the yeast Saccharomyces cerevisiae. In order to replicate, such a DNA molecule requires a specific nucleotide sequence to act as a DNA replication origin; as we discuss below, one can identify the many origins in each yeast chromosome by their ability to allow a test DNA molecule that contains one of them to replicate when free of the host chromosome. A second sequence element, called a centromere, attaches any DNA molecule that contains it to the mitotic spindle during cell division. Each yeast chromosome contains a single centromere; when this sequence is inserted into a plasmid, it guarantees that each daughter cell will receive one of the two copies of the newly replicated plasmid DNA molecule when the yeast cell divides.

The third required sequence element is a telomere, which is needed at each end of a linear chromosome. If a circular plasmid that contains a replication origin and a centromere is broken at a single site to create two free ends in the double helix, it will still replicate and attach to the mitotic spindle, but it will eventually be lost from the progeny cells. This is because replication on the lagging strand of a replication fork requires the presence of some DNA ahead of the sequence to be copied to serve as the template for an RNA primer (see Figure6-44). Since there can never be such a template for the last few nucleotides of a linear DNA molecule, special mechanisms are required to prevent each such DNA strand from becoming shorter with each replication cycle. Bacteria and many viruses solve this "end-replication problem" by having a circular DNA molecule as their chromosome. Eucaryotic cells have instead evolved a specialized telomeric DNA sequence at each chromosome end. This simple repeating sequence is periodically extended by an enzyme, telomerase, thus compensating for the loss of a few nucleotides of telomeric DNA in each cycle and permitting a linear chromosome to be completely replicated.

Figure 8-4summarizes the functions of the three DNA sequence elements that are required for a linear chromosome to propagate itself from generation to generation in a yeast cell. These sequence elements are relatively short (typically less than 1000 base pairs each) and therefore utilize only a tiny fraction of the information-carrying capacity of a chromosome. The same three types of sequence elements are thought to operate in human chromosomes, but to date only the human telomere sequences have been well defined. Although the yeast versions of these sequences do not function in higher eucaryotic cells, recombinant DNA methods allow the yeast sequence elements to be added to human DNA molecules, which can then replicate in yeast cells as artificial chromosomes. In this way yeast cells can be used to prepare human genomic DNA libraries (see p. 315) in which each DNA clone (propagated as an artificial chromosome) contains as many as a million nucleotide pairs of human DNA sequence ( Figure 8-5).

Figure 8-4. The functions of the three DNA sequence elements needed to produce a stable linear eucaryotic chromosome.

Figure 8-4

The functions of the three DNA sequence elements needed to produce a stable linear eucaryotic chromosome. Each chromosome has many origins of replication, one centromere, and two telomeres. The centromere serves to hold the two copies of the duplicated (more...)

Figure 8-5. The making of a yeast artificial chromosome (YAC).

Figure 8-5

The making of a yeast artificial chromosome (YAC). A YAC vector allows the cloning of very large DNA molecules. TEL, CEN, and ORI are the telomere, centromere, and replication origin sequence elements, respectively, for the yeast Saccharomyces cerevisiae (more...)

Most Chromosomal DNA Does Not Code for Proteins or RNAs 3

The genomes of higher organisms seem to contain a large excess of DNA. Long before it was possible to examine the nucleotide sequences of chromosomal DNA directly, it was evident that the amount of DNA in the haploid genome of an organism has no systematic relationship to the complexity of the organism. Human cells, for example, contain about 700 times more DNA than the bacterium E. coli, but some amphibian and plant cells contain 30 times more DNA than human cells ( Figure 8-6). Moreover, the genomes of different species of amphibians can vary 100-fold in their DNA content.

Figure 8-6. Lack of relationship between amount of DNA and organism complexity.

Figure 8-6

Lack of relationship between amount of DNA and organism complexity. The amount of DNA in a haploid genome varies over a 100,000-fold range from the smallest procaryotic cell - the mycoplasma - to the large cells of some plants and amphibia. Note that (more...)

Population geneticists have tried to estimate how much of the DNA in higher organisms codes for essential proteins or RNA molecules on the basis of the following indirect argument. Each gene is inevitably subject to a small risk of accidental mutation, in which nucleotides in the DNA are altered at random. The greater the number of genes, the greater the probability that a mutation will occur in at least one of them. Since most mutations will impair the function of the gene in which they occur, the mutation rate sets an upper limit to the number of essential genes that an organism can depend on for its survival: if there are too many, disaster becomes almost a certainty, as with a complex machine dependent on too many components that are liable to fail. Using this argument and the observed mutation rate, it has been estimated that no more than a small percentage of the mammalian genome can be involved in regulating or encoding essential proteins or RNA molecules. We shall see later that other evidence supports this conclusion.

The most important implication of this estimate is that although the mammalian genome contains enough DNA, in principle, to code for nearly 3 million average-sized proteins (3 × 109 nucleotides), the limited fidelity with which DNA sequences can be maintained means that no mammal (or any other organism) is likely to be constructed from more than perhaps 60,000 essential proteins. Thus, from a genetic point of view, humans are unlikely to be more than about 10 times more complex than the fruit fly Drosophila, which is estimated to have about 5000 essential genes.

Whatever the remaining nonessential DNA in higher eucaryotic chromosomes may do (we discuss this later), the data shown in Figure 8-6 make it clear that it is not a great handicap for a higher eucaryotic cell to carry a large amount of extra DNA. Indeed, even the essential coding regions are often interrupted by long stretches of noncoding DNA.

Each Gene Produces an RNA Molecule 4

The primary function of the genome is to specify RNA molecules. Selected portions of the DNA nucleotide sequence are copied into a corresponding RNA nucleotide sequence, which either encodes a protein (if it is an mRNA) or forms a "structural" RNA, such as a transfer RNA (tRNA) or ribosomal RNA (rRNA) molecule. Each region of the DNA helix that produces a functional RNA molecule constitutes a gene.

In higher eucaryotes genes that are more than 100,000 nucleotide pairs in length are common, and some contain more than 2 million nucleotide pairs ( Table 8-1); yet only about 1000 nucleotide pairs are required to encode a protein of average size (one containing 300 to 400 amino acids). Most of the extra length consists of long stretches of noncoding DNA that interrupt the relatively short segments of coding DNA. The coding sequences are called exons; the intervening (noncoding) sequences are called introns. The RNA molecule (called a primary RNA transcript) synthesized from such a gene is altered to remove the intron sequences during its conversion to an mRNA molecule (see Figure 8-2) in the process of RNA splicing, as we discuss later.

Table 8-1. The Size of Some Human Genes in Thousands of Nucleotides.

Table 8-1

The Size of Some Human Genes in Thousands of Nucleotides.

Large genes consist of a long string of alternating exons and introns, with most of the gene consisting of introns. In addition, each gene is associated with regulatory DNA sequences, which are responsible for ensuring that the gene is transcribed at the proper time and in the appropriate cell type. We discuss in Chapter 9 how these regulatory sequences work. Many of them are located "upstream" (on the 5' side) of the site where RNA transcription begins, but they can also be located "downstream" (on the 3' side) of the site where RNA transcription ends, or even in introns or exons. A typical vertebrate chromosome is illustrated schematically in Figure 8-7, along with one of its many genes.

Figure 8-7. The organization of genes on a typical vertebrate chromosome.

Figure 8-7

The organization of genes on a typical vertebrate chromosome. Proteins that bind to the DNA in regulatory regions determine whether a gene is transcribed; although often located on the 5' side of a gene, as shown here, regulatory regions can also be located (more...)

Comparisons Between the DNAs of Related Organisms Distinguish Conserved and Nonconserved Regions of DNA Sequence 5

Technical improvements in DNA sequencing are expected to allow the routine sequencing of stretches of chromosomal DNA that are millions of nucleotide pairs long, so that in the foreseeable future the sequence of all 3 x 109 nucleotides of the human genome will be determined. As more than 90% of this sequence is probably unimportant, it will be crucial to have some way of identifying the small proportion of sequence that is important. One approach to this problem is based on the observation that important sequences are conserved during evolution, while unimportant ones are free to mutate randomly. The strategy, therefore, is to compare the human sequence with that of the corresponding regions of a related genome, such as that of the mouse. Humans and mice are thought to have diverged from a common mammalian ancestor about 80 x 106 years ago, which is long enough for roughly two out of every three nucleotides to have been changed by random mutational events. Consequently, the only regions that will have remained closely similar in the two genomes are those where mutations would impair function and put animals carrying them at a disadvantage, resulting in their elimination from the population by natural selection. Such closely similar regions are known as conserved regions. In general, nonconserved regions represent noncoding DNA - both between genes and in introns - whose sequence is not critical for function, whereas conserved regions represent functionally important exons and regulatory sequences. By revealing in this way the results of a very long natural "experiment," comparative DNA sequencing studies highlight the most interesting regions in genomes. Such studies also provide strong support for the conclusion that only about 10% of the vertebrate genome sequence is vitally important to the organism.

Histones Are the Principal Structural Proteins of Eucaryotic Chromosomes 6

If chromosomes were composed simply of extended DNA, it is difficult to imagine how they could be replicated and segregated to daughter cells without becoming severely tangled or broken. In fact, the DNA of all chromosomes is packaged into a compact structure with the aid of specialized proteins. It is traditional to divide the DNA-binding proteins in eucaryotes into two general classes: the histones and the nonhistone chromosomal proteins. The complex of both classes of proteins with the nuclear DNA of eucaryotic cells is known as chromatin. Histones are unique to eucaryotes. They are present in such enormous quantities (about 60 million molecules of each type per cell, compared to 10,000 molecules per cell for a typical sequence-specific DNA-binding protein) that their total mass in chromatin is about equal to that of the DNA.

Histones are relatively small proteins with a very high proportion of positively charged amino acids (lysine and arginine); the positive charge helps the histones bind tightly to DNA (which is highly negatively charged), regardless of its nucleotide sequence. Histones probably only rarely dissociate from the DNA, and so they are likely to have an influence on any reaction that occurs on chromosomes.

The five types of histones fall into two main groups - the nucleosomal histonesand the H1 histones. The nucleosomal histones are small proteins (102-135 amino acids) responsible for coiling the DNA into nucleosomes, as discussed later. These four histones are designated H2A, H2B, H3, and H4. H3 and H4 are among the most highly conserved of all known proteins ( Figure 8-8). This evolutionary conservation suggests that their functions involve nearly all of their amino acids, so that a change in any position is deleterious to the cell. This suggestion has been tested in yeasts, where it is possible to mutate a given histone gene in vitro and introduce it into the yeast genome in place of the normal gene. As predicted, many mutations are found to be lethal; some that are not lethal cause changes in the normal pattern of gene expression (discussed in Chapter 9). The H1 histones are larger (containing about 220 amino acids) and have been less conserved during evolution than the nucleosomal histones.

Figure 8-8. The amino acid sequence of histone H4.

Figure 8-8

The amino acid sequence of histone H4. The amino acids are designated by their single-letter abbreviations, with the positively charged amino acids colored for emphasis. As in the three other nucleosomal histones, an elongated amino-terminal "tail" is reversibly (more...)

Histones Associate with DNA to Form Nucleosomes, the Unit Particles of Chromatin 7

If it were stretched out, the DNA double helix in each human chromosome would span the cell nucleus thousands of times. Histones play a crucial part in packing this very long DNA molecule in an orderly way into a nucleus only a few micrometers in diameter. Their role in DNA folding is also important for a second reason. As we shall see, not all the DNA is folded in exactly the same way, and the manner in which a region of the genome is packaged into chromatin in a particular cell seems to influence the activity of the genes the region contains.

A major advance in our understanding of chromatin structure came in 1974 with the discovery of the fundamental packing unit known as the nucleosome, which gives chromatin a "beads-on-a-string" appearance in electron micrographs taken after treatments that unfold higher-order packing ( Figure 8-9). The long DNA "string" can be broken into nucleosome "beads" by digestion with enzymes that degrade DNA, such as the bacterial enzyme micrococcal nuclease. (Enzymes that degrade both DNA and RNA are called nucleases;enzymes that degrade only DNA are deoxyribonucleases, or DNases.) After digestion for a short period with micrococcal nuclease, only the DNA between the nucleosome beads is degraded. The rest is protected from digestion and remains as double-helical DNA fragments 146 nucleotide pairs long bound to a specific complex of eight nucleosomal histones (the histone octamer). The nucleosome beads obtained in this way have been crystallized and analyzed by x-ray diffraction. Each is a disc-shaped particle with a diameter of about 11 nm containing two copies of each of the four nucleosomal histonesH2A, H2B, H3, and H4. This histone octamer forms a protein core around which the double-stranded DNA helix is wound twice ( Figure 8-10).

Figure 8-9. Nucleosomes as seen in the electron microscope.

Figure 8-9

Nucleosomes as seen in the electron microscope. These electron micrographs show chromatin strands before and after treatments that unpack, or "decondense," the native structure to produce the "beads-on-a-string" form. The native structure, known as the (more...)

Figure 8-10. The nature of the nucleosome.

Figure 8-10

The nature of the nucleosome. (A) depicts two views of the three-dimensional structure of the histone octamer; the general path of the DNA wrapped around it is indicated by a coiled tube ( top) and a series of parallel lines ( bottom). Two H2A-H2B dimers (more...)

In undigested chromatin the DNA extends as a continuous double-helical thread from nucleosome to nucleosome. Each nucleosome is separated from the next by a region of linker DNA, which can vary in length from 0 to 80 nucleotide pairs. On average, nucleosomes repeat at intervals of about 200 nucleotide pairs (see Figure 8-10). Thus, a typical eucaryotic gene of 10,000 nucleotide pairs will be associated with 50 nucleosomes, and each human cell with 6 x 109 DNA nucleotide pairs contains 3 x 107 nucleosomes.

The Positioning of Nucleosomes on DNA Is Determined by the Propensity of the DNA to Form Tight Loops and by the Presence of Other DNA-bound Proteins 8

Experiments performed in vitro with isolated chromatin suggest that the histone octamers generally remain fixed in one position under physiological conditions, inasmuch as their tight binding to DNA prevents them from sliding back and forth along the helix. There are two main influences that determine where nucleosomes form in the DNA. One is the difficulty of bending the DNA double helix into two tight turns around the outside of the histone octamer, a process that requires substantial compression of the minor groove of the helix. Because A-T-rich sequences in the minor groove are easier to compress than G-C-rich sequences, each histone octamer tends to position itself on the DNA so as to maximize A-T-rich minor grooves on the inside of the DNA coil ( Figure 8-11). Thus a segment of DNA that contains short A-T-rich sequences spaced by integral numbers of DNA turns will be much easier to bend around the nucleosome than a segment of DNA lacking this feature. This probably explains some striking cases of very precise positioning of nucleosomes, such as those that bind to the tiny 5S rRNA genes, each of which has a single nucleosome bound to it in a unique location. If the DNA containing the 5S rRNA genes is added in vitro to a mixture of the four purified nucleosomal histones, nucleosomes will re-form at the exact position where they are located in vivo. For most of the DNA sequences found in chromosomes, however, there is no strongly preferred nucleosome binding site; instead, a nucleosome can occupy any one of a number of positions relative to the DNA sequence.

Figure 8-11. The bending of DNA in a nucleosome.

Figure 8-11

The bending of DNA in a nucleosome. The DNA helix makes two tight turns around the histone octamer. This diagram is drawn approximately to scale to illustrate how the minor groove is compressed on the inside of the turn. Due to certain structural features (more...)

The second important influence on nucleosome positioning is the presence of other tightly bound proteins on the DNA that prevent nucleosomes from forming. For this reason some regions of DNA appear to lack a nucleosome even though they are hundreds of nucleotide pairs long. They can be detected by treating cell nuclei with trace amounts of a deoxyribonuclease (DNase I) that at low concentrations will digest long stretches of nucleosome-free DNA but not the short stretches of linker DNA between nucleosomes. Such nuclease-hypersensitive sites often lie in the regulatory regions of genes ( Figure 8-12). The first evidence for this idea came from studies of the monkey virus SV40, whose circular DNA chromosome binds to histones produced by its host cells. The SV40 chromosome often contains a single nucleosome-free region about 300 nucleotide pairs long very near the sequences at which viral DNA synthesis and RNA synthesis begin. Although several sequence-specific DNA-binding proteins are bound to this region, they do not protect long stretches of DNA against nuclease attack as do the nucleosomes, which is why the site is DNase-I sensitive.

Figure 8-12. The location of nuclease-hypersensitive sites in the regulatory regions of active genes.

Figure 8-12

The location of nuclease-hypersensitive sites in the regulatory regions of active genes. The genes shown encode histones (H1, H2A, H2B, H3, and H4) in Drosophila. The horizontal arrowsdenoting each gene point in the direction of DNA transcription, which (more...)

The default state of the DNA in eucaryotic cells is to be fully covered with nucleosomes, and most nucleosome-free regions are specifically created by gene regulatory proteins as part of the process of activating DNA transcription (discussed in Chapter 9). Wherever nucleosomes are specifically positioned by the DNA sequence itself, there may have been evolutionary pressure to keep the adjacent linker DNA free of a nucleosome so as to facilitate its recognition by sequence-specific DNA-binding proteins.

Nucleosomes Are Usually Packed Together by Histone H1 to Form Regular Higher-Order Structures 9

The linker DNA that connects adjacent nucleosomes can vary in length since nucleosomes position themselves according to the local flexibility of the DNA helix and the distribution of other proteins bound to specific DNA sequences. Although long strings of nucleosomes form on most chromosomal DNA, in the living cell chromatin probably rarely adopts the extended "beads-on-a-string" form. Instead, the nucleosomes are packed upon one another to generate regular arrays in which the DNA is even more highly condensed. Thus, when nuclei are very gently lysed onto an electron microscope grid, most of the chromatin is seen to be in the form of a fiber with a diameter of about 30 nm, which is considerably wider than chromatin in the "beads-on-a-string" form. One of several models proposed to explain how nucleosomes are packed in the 30-nm chromatin fiber is illustrated in Figure 8-13. Such models represent an idealized structure, since both the range of linker lengths that result from preferred nucleosome positioning and the presence of occasional nucleosome-free sequences will punctuate the 30-nm fiber with irregular features ( Figure 8-14).

Figure 8-13. The 30-nm chromatin fiber.

Figure 8-13

The 30-nm chromatin fiber. A model to explain how the "beads-on-a-string" form of nucleosomes is packed to form the 30-nm fiber seen in electron micrographs (see Figure 8-9A) in top (A) and side view (B). This type of packing requires one molecule of (more...)

Figure 8-14. Nucleosome-free regions in 30-nm fibers.

Figure 8-14

Nucleosome-free regions in 30-nm fibers. A schematic section of chromatin illustrating the interruption of its regular nucleosomal structure by short regions where the chromosomal DNA is unusually vulnerable to digestion by DNase I. At each of these nuclease-hypersensitive (more...)

The histone H1 molecules, of which there are about six closely related subtypes in a mammalian cell, are thought to be responsible for pulling nucleosomes together to form the 30-nm fiber. The H1 molecule has an evolutionarily conserved globular central region linked to less conserved extended amino-terminal and carboxyl-terminal "arms." Each H1 molecule binds through its globular portion to a unique site on a nucleosome, and its arms extend to contact other sites on the histone cores of adjacent nucleosomes, so that the nucleosomes are pulled together into a regular repeating array ( Figure 8-15).

Figure 8-15. The way histone H1 is thought to help pack adjacent nucleosomes together.

Figure 8-15

The way histone H1 is thought to help pack adjacent nucleosomes together. The globular core of H1 binds to each nucleosome near the site where the DNA helix enters and leaves the histone octamer. When H1 is present on the nucleosomes, 166 nucleotide pairs of (more...)


A gene is defined as a nucleotide sequence in a DNA molecule that acts as a functional unit for the production of an RNA molecule. A chromosome is formed from a single, enormously long DNA molecule that contains a series of many genes. A chromosomal DNA molecule also contains three other types of functionally important nucleotide sequences: replication origins and telomeres allow the DNA molecule to be replicated, while a centromere is needed to attach the DNA molecule to the mitotic spindle, ensuring its accurate segregation to daughter cells. The human haploid genome contains 3 × 109 DNA nucleotide pairs, divided among 22 different autosomes and 2 sex chromosomes. Only a small percentage of this DNA is thought to code for proteins.

The DNA in eucaryotes is tightly bound to an equal mass of histones, which form a repeating array of DNA-protein particles called nucleosomes. The nucleosome is made up of an octameric core of histone proteins around which the DNA is wrapped twice. The ease with which a segment of DNA can undergo the severe bending required for this wrapping varies with its nucleotide sequence. Despite irregularities such as this, nucleosomes are usually packed together, with the aid of histone H1 molecules, into regular arrays to form a 30-nm fiber.

Image ch6f44
Image ch8f2
Image ch8f30
Image ch8f45
Image ch9f34

By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.

Copyright © 1994, Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts, and James D Watson.
Bookshelf ID: NBK28383