NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Strachan T, Read AP. Human Molecular Genetics. 2nd edition. New York: Wiley-Liss; 1999.

Cover of Human Molecular Genetics

Human Molecular Genetics. 2nd edition.

Show details

Chapter 8Human gene expression

8.1. An overview of gene expression in human cells

The control mechanisms used to regulate human gene expression are fundamentally similar to those found in other mammals, and generally resemble those in eukaryotes in general. Although much more complex than equivalent mechanisms in organisms with small genomes, many of the same basic principles apply and as in other eukaryotes, a major level at which gene expression is controlled is the initiation of transcription. Mammals are particularly complex multicellular organisms and so it is perhaps unsurprising that there are some gene control mechanisms which are not used in bacteria or in some other eukaryotes. Various regulation mechanisms are required to maintain many different facets of mammalian gene expression, both at the spatial and temporal levels (Box 8.1). Although simplistic, it is convenient to consider three broad levels at which gene regulation can operate.

Box Icon

Box 8.1

Spatial and temporal restriction of gene expression in mammalian cells. The regulation of gene expression in human and mammalian cells is exerted at different levels in order to achieve restricted expression at a variety of spatial and temporal levels. (more...)

Transcriptional regulation of gene expression

We have long been accustomed to the idea that a primary control of gene regulation in eukaryotes occurs at the level of initiation of transcription. Regulation of expression can occur through the core promoter of a gene, at the level of recruitment and processivity of the relevant RNA polymerase. Expression of genes is initiated by the binding of transcription factors to the promoter. Basal levels of transcription can be modulated by binding of protein factors to other regulatory regions occurring in the sequences flanking the gene or sometimes within introns of the gene.

Post-transcriptional regulation of gene expression

This category overlaps with the previous section since it includes mechanisms operating at the level of RNA processing, such as RNA splicing which may more accurately be considered as co-transcriptional rather than posttranscriptional (Steinmetz, 1997). In addition to RNA processing, other levels at which control of gene expression can be exerted include: mRNA transport, translation, mRNA stability, protein processing, protein targeting, protein stability, etc.

A surprising variety of mechanisms are employed at the level of RNA processing with single alleles in an individual often able to generate a variety of different gene products (isoforms). The occurrence of these and other mechanisms has required a more flexible definition of the term gene than has formerly been used (see below). Several mechanisms are involved in regulating gene expression at the level of translation and an increasing number of regulatory sequences have been identified in the 5′ and especially 3′ untranslated regions of mRNA. Control of gene expression at the level of protein processing, targeting and stability has been shown in certain systems. For example, activation of some peptide hormones such as insulin requires post-translational cleavage from precursor forms (see Figure 1.23).

Epigenetic mechanisms and long range control of gene expression

In addition to genetic factors, additional factors which can be transmitted to progeny cells following cell division but which are not directly attributable to the DNA sequence are described as epigenetic. DNA methylation is an epigenetic mechanism which plays an important part in mammalian gene control, acting as a general method of maintaining repression of transcription. In addition, a variety of other mechanisms which affect the chromatin environment of a gene and hence its capacity for gene expression are known to operate in mammalian cells. In some cases, the mechanisms ensure that within a cell only one of the two parentally inherited alleles is normally expressed, even although the nucleotide sequence of the allele which is not expressed may be identical to the one which is expressed.

Table 8.1 provides an overview of the different types of mechanism known to be involved in regulating expression of human genes.

Table 8.1. Overview of the regulation of gene expression in human cells.

Table 8.1

Overview of the regulation of gene expression in human cells.

8.2. Control of gene expression by binding of trans-acting protein factors to cis-acting regulatory sequences in DNA and RNA

A common molecular basis for much of the control of gene expression (whether it occurs at the level of initiation of transcription, RNA processing, translation or RNA transport) is the binding of protein factors to regulatory nucleic acid sequences. The latter can be DNA sequences found in the vicinity of the gene or even within it, or RNA transcript sequences at the level of precursor RNA or mRNA. As the protein factors engaged in regulating gene expression are themselves encoded by distantly located genes, they are required to migrate to their site of action, and so are called trans -acting factors. In contrast, the regulatory sequences to which they bind to are on the same DNA or RNA molecule as the gene or RNA transcript that is being regulated. Such sequences are said to be cis -acting.

Control by DNA-protein binding

A major control of gene expression in eukaryotic cells is exerted at the level of initiation of transcription where three different types of RNA polymerase are known to transcribe different classes of genes (see Table 1.3). All three types of RNA polymerase are large enzymes, consisting of 8–14 subunits, and in each case the polymerase is recruited to transcribe a gene following binding of proteins (transcription factors) to specific regulatory DNA sequences within the gene or in its vicinity. Chromatin is a highly organized and densely packed structure which does not easily afford access to RNA polymerases and so transcription factors are required to help activate it to give a more open structure that will enable transcription to take place.

Control by RNA-protein binding

In addition to transcription factors, RNA-binding proteins are used to regulate gene expression. The best-studied examples involve binding to regulatory sequences in the untranslated sequences of mRNA, permitting translational control of gene expression. In addition, specific RNA-protein binding interactions are expected to be involved in the control of gene expression at the level of differential RNA processing too, as in the case of binding of SR and HnRNP proteins to pre-mRNA in order to modulate the choice of exons in splicing. The latter mechanisms are considered separately in Section 8.3 to illustrate the tremendous complexity of expression mechanisms that can be used to decode single genes, and the significance of the large numbers of isoforms that can be produced as a result.

8.2.1. Ubiquitous transcription factors are required for transcription of RNA polymerases I and III

RNA polymerases I and III in eukaryotic cells are dedicated to transcribing genes to give RNA molecules (rRNA, tRNA, etc.) which assist in expression of the polypeptide-encoding genes. The transcribed genes are housekeeping genes since rRNA and tRNA are required in essentially all cells to assist in protein synthesis. As a result, ubiquitous transcription factors are required to assist RNA polymerases I and III.

Transcription by RNA polymerase I

RNA polymerase I is confined to the nucleolus and is devoted to transcription of the 18S, 5.8S and 28S rRNA genes. The latter are consecutively organized on a common 13 kb transcription unit (Figure 8.1). A compound unit of the 13-kb transcription unit and an adjacent 27 kb nontranscribed spacer is tandemly repeated about 50–60 times on the short arms of each of the five human acrocentric chromosomes, at the nucleolar organizer regions (see Figure 2.18). The resulting five clusters of rRNA genes, each about 2 Mb long, are referred to as ribosomal DNA or rDNA.

Figure 8.1. The major human rRNA species are synthesized by cleavage from a common 13 kb transcription unit which is part of a 40 kb tandemly repeated unit.

Figure 8.1

The major human rRNA species are synthesized by cleavage from a common 13 kb transcription unit which is part of a 40 kb tandemly repeated unit. Small arrows indicated by letters A-D signify positions of endonuclease cleavage of RNA precursors. Cleavage (more...)

Initiation of transcription of the 28S, 5.8S and 18S rRNA genes is initiated following binding of two transcription factors to a core promoter element at the transcription initiation site and an upstream control element located over 100 nucleotides upstream. One of the transcription factors, UBF (upstream binding factor), is a homodimer and its identical subunits may bind first to the core promoter element and upstream control element, bringing them together so that they can be bound by the second factor, SL1 (selectivity factor 1; known in mouse as TFI-1B; Figure 8.2). The bound transcription factors subsequently recruit RNA polymerase I to form an initiation complex.

Figure 8.2. Initiation of transcription by RNA polymerase I.

Figure 8.2

Initiation of transcription by RNA polymerase I. One possible model envisages initial binding of the two identical subunits of the upstream binding factor to the upstream control element and the core promoter element, and forcing these two sequences to (more...)

The primary transcript expressed from the single 13 kb transcription unit is a 45S precursor rRNA which undergoes a variety of cleavage reactions and base-specific modifications (carried out by a large number of different types of small nucleolar RNA (snoRNA)) to generate the mature 28S, 5.8S and 18S rRNA species (see Figure 8.1). Thus, these genes differ from the vast majority of nuclear genes, which are individually transcribed. Instead, rDNA transcription resembles mtDNA transcription (see Section 7.1.1 and Figure 7.2): both result in multigenic transcripts which yield functionally related products. This unusual use of polygenic primary transcripts is no different in principle, however, from the way in which a single primary translation product is occasionally cleaved to generate two or more functionally related polypeptides (see the example of human insulin in Figure 1.23).

Transcription by RNA polymerase III

RNA polymerase III is also involved in transcription of a variety of housekeeping genes, encoding various small stable RNA molecules such as 5S rRNA, tRNA molecules, 7SL RNA and some of the snRNA molecules needed for RNA splicing. These genes are characterized by promoters that lie within the coding sequence of the gene, rather than upstream of it. In tRNA genes, the promoter is bipartite, consisting of two well conserved sequences, the A box and the B box, while in the 5S rRNA gene, a single promoter element is present, the C box. In each case, transcription by RNA polymerase III is thought to proceed by binding of ubiquitous transcription factors to the promoter elements, followed by subsequent binding of other factors and finally recruitment of the polymerase (Figure 8.3).

Figure 8.4. Conserved locations in complex eukaryotes for regulatory promoter elements bound by ubiquitous transcription factors.

Figure 8.4

Conserved locations in complex eukaryotes for regulatory promoter elements bound by ubiquitous transcription factors. Note that the core promoter of individual genes need not contain all elements. For example, many promoters lack a TATA box and use instead (more...)

Figure 8.5. The human insulin gene promoter contains a variety of sequence elements recognized by ubiquitous and tissue-specific transcription factors.

Figure 8.5

The human insulin gene promoter contains a variety of sequence elements recognized by ubiquitous and tissue-specific transcription factors. Arrows indicate binding of transcription factors (top row) to regulatory sequence elements present upstream of (more...)

Figure 8.3. tRNA and 5S rRNA genes have promoters located within the coding sequence.

Figure 8.3

tRNA and 5S rRNA genes have promoters located within the coding sequence. (A) Positions of promoter elements in tRNA and 5S rRNA genes. The promoter elements A and B in the tRNA genes are located in the sequences specifying the D loop and the TwCG loops (more...)

8.2.2. Transcription of polypeptide-encoding genes often requires complex sets of cis-acting transcriptional control sequences and tissue-specific transcription factors

RNA polymerase II is responsible for transcribing all genes which encode polypeptides and also certain species of snRNA gene. Like RNA polymerases I and III, RNA polymerase II is dependent on auxiliary general transcription factors (the usual nomenclature has a common prefix TF to denote transcription factor followed by a Roman numeral to denote the associated RNA polymerase). In the case of RNA polymerase II, there are a variety of auxiliary transcription factors such as TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, etc, which can be complex in structure (Nikolov and Burley, 1997). For example, the TATA box-binding protein (TBP), is only one of the multiple protein subunits that make up TFIID and the associated proteins are known as TBP-associated factors, or TAF proteins. The complex of polymerase and general transcription factors is known as the basal transcription apparatus; it constitutes all that is required to initiate transcription. Genes are constitutively expressed at a minimum rate determined by the core promoter (see below) unless the rate of transcription is increased or switched off by additional positive or negative regulatory elements (which may be located some distance away or by intrinsic components of the promoter itself).

Some of the genes which encode polypeptides are housekeeping genes, but unlike the products of genes transcribed by RNA polymerases I and III, a large percentage of genes transcribed by RNA polymerase II show tissue-restricted or tissue-specific expression patterns. Since the DNA in different nucleated cells of an individual is essentially identical, the identity of a cell, whether it be a hepatocyte or a T lymphocyte for instance, is defined by the proteins made by the cell. In addition to general ubiquitous transcription factors, therefore, tissue-specific or tissue-restricted transcription factors regulate the expression of many genes which encode polypeptides, by recognizing and binding specific cis-acting sequence elements.

Partly because of the large size of mammalian nuclear genomes and also because of the general need for more sophisticated control systems imposed by having very large numbers of interacting genes, control elements in eukaryotic cells are quite elaborate. Often, regulation of expression of individual human genes is controlled by several sets of cis-acting regulatory elements. While the individual regulatory elements may be composed of multiple short sequence elements (typically 4–8 nucleotides long) distributed over a few hundred base pairs, the different classes of regulatory element which modulate the expression of a single gene may be located at considerable distances. A variety of different types of cis-acting elements can be recognized, including promoters, enhancers, silencers, boundary elements (insulators) and response elements (see Box 8.2).

Box Icon

Box 8.2

Classes of cis-acting sequence elements involved in regulating transcription of polypeptide-encoding genes. Promoters are combinations of short sequence elements usually located in the immediate upstream region of the gene, often within 200 bp of the (more...)

Tissue specificity and developmental stage specificity of gene expression is often conferred by enhancer and silencer sequences and a variety of cis-acting sequences have been identified which are specifically recognized by tissue-specific transcription factors. For example, specific expression in erythroid cells is often signalled by one of two sequences: TGACTCAG (or its reverse complement CTGAGTCA) which are recognized by the erythroid-specific transcription factor NF-E2, or by the sequence (A/T)GATA(A/G) or its reverse complement which are recognized by the GATA series of erythroid specific transcription factors (see Figure 8.6 for an example). Some other examples of cis-acting sequence elements recognized by tissue-specific or tissue-restricted transcription factors are listed in Table 8.2.

Figure 8.6. The HS-40 α-globin regulatory site contains many recognition elements for erythroid-specific transcription factors.

Figure 8.6

The HS-40 α-globin regulatory site contains many recognition elements for erythroid-specific transcription factors. Note that the HS-40 site appears to be a locus control region for the α-globin gene cluster (see Section 8.5.2).

Table 8.2. Examples of cis-acting sequences recognized by tissue-restricted and tissue-specific transcription factors.

Table 8.2

Examples of cis-acting sequences recognized by tissue-restricted and tissue-specific transcription factors.

In addition to actively promoting tissue-specific transcription, some cis-acting silencer elements confer tissue or developmental stage specificity by blocking expression in all but the desired tissue. For example, the neural restrictive silencer element (NRSE) represses expression of several genes in all tissues other than neural tissues (Schoenherr et al., 1996). A transcription factor that binds to the NRSE and which is variously called the neural restrictive silencer factor (NRSF) or the RE-1 silencing transcription factor (REST) is ubiquitously expressed in non-neural tissue and neuronal precursors during early development but subsequently it is specifically not expressed in more mature (postmitotic) neurons.

8.2.3. Transcription factors contain conserved structural motifs that permit DNA binding

Transcription factors recognize and bind a short nucleotide sequence, usually as a result of extensive complementarity between the surface of the protein and surface features of the double helix in the region of binding. Although the individual interactions between the amino acids and nucleotides are weak (usually hydrogen bonds, ionic bonds and hydrophobic interactions), the region of DNA-protein binding is typically characterized by about 20 such contacts, which collectively ensure that the binding is strong and specific. In human and other eukaryotic transcription factors, two distinct functions can often be identified and located in different parts of the protein:

  • An activation domain. As the name suggests, this type of domain functions in activating transcription of the target genes once the transcription factor has bound to it. Activation domains are thought to stimulate transcription by interacting with basal transcription factors so as to assist the formation of the transcription complex on the promoter. Although not so well-studied as DNA-binding domains, some are known to be rich in aspartate and glutamate residues (acidic activation domains); others are rich in proline or glutamate.
  • A DNA-binding domain. This type of domain is necessary to permit specific binding of the transcription factor to its target genes. In contrast to activation domains, DNA-binding domains of transcription factors have been well-studied. A number of conserved structural motifs have been identified which are common to many different transcription factors with quite different specificities, including the leucine zipper, helix-loop-helix, helix-turn-helix, and zinc finger motifs which are described below. Each of the motifs uses α-helices (or occasionally β-sheets; see Figure 1.24) to bind to the major groove of DNA. Clearly, although the motifs in general provide the basis for DNA binding, the precise collection of sequence elements in the DNA-binding domain will provide the basis for the required sequence-specific recognition. Most transcription factors bind to DNA as homodimers, with the DNA-binding region of the protein usually distinct from the region responsible for forming dimers.

The leucine zipper motif

The leucine zipper is a helical stretch of amino acids rich in leucine residues (typically occurring once every seven amino acid residues, i.e. once every two turns of the helix - see Figure 8.7), which readily forms a dimer. Each monomer unit consists of an amphipathic a-helix (hydrophobic side groups of the constituent amino acids face one way; polar groups face the other way, see Figure 1.24). The two α-helices of the individual monomers join together over a short distance to form a coiled-coil (see Section 1.5.5) with the predominant interactions occurring between opposed hydrophobic amino acids of the individual monomers. Beyond this region the two α-helices separate, so that the overall dimer is a Y-shaped structure. The dimer is thought to grip the double helix much like a clothes peg grips a clothes line (Figure 8.8). In addition to forming homodimers, leucine zipper proteins can occasionally form heterodimers depending on the compatibility of the hydrophobic surfaces of the two different monomers. Such heterodimer formation provides an important combinatorial control mechanism in gene regulation.

Figure 8.7. Structural motifs commonly found in transcription factors and DNA-binding proteins.

Figure 8.7

Structural motifs commonly found in transcription factors and DNA-binding proteins. Abbreviations: HTH, helix-turn-helix; HLH, helix-loop-helix. Note that the leucine zipper monomer is amphipathic [i.e. has hydrophobic residues (leucines) consistently (more...)

Figure 8.8. Binding of conserved structural motifs in transcription factors to the double helix.

Figure 8.8

Binding of conserved structural motifs in transcription factors to the double helix. Note that the individual monomers of the helix-loop-helix (HLH) dimer and the leucine zipper dimer are colored differently to permit distinction, but may be identical (homodimers). (more...)

The helix-loop-helix motif

The helix-loop-helix (HLH) motif is related to the leucine zipper and should be distinguished from the helix- turn-helix (HTH) motif described in the next section. It consists of two α-helices, one short and one long, connected by a flexible loop. Unlike the short turn in the HTH motif, the loop in the HLH motif is flexible enough to permit folding back so that the two helices can pack against each other; that is, the two helices lie in planes that are parallel to each other, in contrast to the two helices in the HTH motif (Figure 8.7). The HLH motif mediates both DNA binding and protein dimer formation (Figure 8.8) and it permits occasional heterodimer formation. In the latter case, however, heterodimers form between a full-length HLH protein and a truncated HLH protein which lacks the full length of the α-helix necessary to bind to the DNA. The resulting heterodimer is unable to bind DNA tightly. As a result, HLH dimers are thought to act as a control mechanism, by enabling inactivation of specific gene regulatory proteins.

The helix-turn-helix motif

The HTH motif is a common motif found in homeoboxes, and a number of other transcription factors. It consists of two short α-helices separated by a short amino acid sequence which induces a turn, so that the two α-helices are orientated differently (i.e. the two helices do not lie in the same plane, unlike those in the HLH motif; Figure 8.7). The structure is very similar to the DNA-binding motif of several bacteriophage regulatory proteins such as the λ cro protein whose binding to DNA has been intensively studied by X-ray crystallography. In the case of both the λ cro protein and eukaryotic HTH motifs, it is thought that while the HTH motif in general mediates DNA binding, the more C-terminal helix acts as a specific recognition helix because it fits into the major groove of the DNA (Figure 8.8), controlling the precise DNA sequence which is recognized.

The zinc finger motif

The zinc finger motif involves binding a zinc ion by four conserved amino acids to form a loop (finger), a structure which is often tandemly repeated. Although several different forms exist, common forms involve binding of a Zn2+ ion by two conserved cysteine residues and two conserved histidine residues, or by four conserved cysteine residues. The resulting structure may then consist of an α-helix and a β-sheet held together by coordination with the Zn2+ ion, or of two α-helices. In either case, the primary contact with the DNA is made by an α-helix binding to the major groove. The so-called Cys2/His2 finger typically comprises about 23 amino acids with neighboring fingers separated by a stretch of about seven or eight amino acids (Figure 8.7).

8.2.4. A variety of mechanisms permit transcriptional regulation of gene expression in response to external stimuli

In eukaryotic cells, gene expression can be altered in a semipermanent way as cells differentiate, or in a temporary, easily reversible way in response to extracellular signals (inducible gene expression). Environmental cues such as the extracellular concentrations of certain ions and small nutrient molecules, temperature, shock, etc., can result in dramatic alteration of gene expression patterns in cells exposed to changes in these parameters. In complex multicellular animals there are also fundamental requirements for cells to communicate with each other and different modes of cell signaling are possible (Table 8.3). In some cases, alteration of gene expression is conducted at the translational level which can offer certain advantages (Section 8.2.5). In other cases, gene expression is altered by modulating transcription.

Table 8.3. Different modes of cell signaling.

Table 8.3

Different modes of cell signaling.

Transcriptional regulation in response to cell signaling can take different forms, but the endpoint is always the same: a previously inactive transcription factor is specifically activated by the signaling pathway and then subsequently binds to specific regulatory sequences located in the promoters of target genes, thereby activating their transcription. In the case of transcription regulated by signaling molecules or their intermediaries, such regulatory sequences are often referred to as response elements (see Table 8.4).

Table 8.4. Examples of response elements in inducible gene expression.

Table 8.4

Examples of response elements in inducible gene expression.

Ligand-inducible transcription factors

Small hydrophobic hormones and morphogens such as steroid hormones, thyroxine and retinoic acid are able to diffuse through the plasma membrane of the target cell and bind intracellular receptors in the cytoplasm or nucleus. These receptors (often called hormone nuclear receptors) are inducible transcription factors. Following binding of the homologous ligand, the receptor protein associates with a specific DNA response element located in the promoter regions of perhaps 50–100 target genes and activates their transcription.

Although thyroxine and retinoic acid are structurally and biosynthetically unrelated to the steroid hormones, their receptors belong to a common nuclear receptor superfamily. Two conserved domains characterize the family: a centrally located DNA binding domain of about 68 amino acids, and a ~240 amino acid ligand-binding domain located close to the C terminus (Figure 8.9). The DNA binding domain contains zinc fingers and binds as a dimer with each monomer recognizing one of two hexanucleotides in the response element. The two hexanucleotides are either inverted repeats or direct repeats which are typically separated by three or five nucleotides (Figure 8.9). In the absence of the ligand, the receptor is inactivated by direct repression of the DNA binding domain function by the ligand-binding domain, or by binding to an inhibitory protein, as in the case of the glucocorticoid receptor (Figure 8.10).

Figure 8.9. Steroid receptors and the respective response elements.

Figure 8.9

Steroid receptors and the respective response elements. (A) Structure of members of the nuclear receptor superfamily. Numbers refer to protein size in amino acids. Abbreviations: ER, estrogen receptor; GR, gluocorticoid receptor; PR, progesterone receptor; (more...)

Figure 8.10. Transcriptional regulation by glucocorticoids.

Figure 8.10

Transcriptional regulation by glucocorticoids. The glucocorticoid receptor is normally inactivated by being bound to an inhibitor protein, Hsp90. Binding of glucocorticoids to the glucocorticoid receptor releases Hsp90, the receptor dimerizes and then (more...)

Activation of transcription factors by signal transduction

Unlike lipid-soluble hormones or morphogens, hydrophilic signaling molecules such as polypeptide hormones, cannot diffuse through the plasma membrane. Instead, they bind to a specific receptor on the cell surface. After binding of the ligand molecule, the receptor undergoes a conformational change and becomes activated in such a way that it passes on the signal via other molecules within the cell (signal transduction). Various classes of cell surface receptor are known but many of them have a kinase activity or can activate intracellular kinases (see Table 8.5). Signal transduction pathways are often characterized by complex regulatory interplay between kinases and phosphatases which can activate or repress intermediates by phosphorylation/dephosphorylation. In many cases, the phosphorylation or dephosphorylation induces an altered conformation. In the case of activation of a signaling molecule, the altered conformation often means that a signaling factor is no longer inhibited by some repressor sequence present in an inhibitory protein to which it is bound, or in a domain or sequence motif within its own structure.

Table 8.5. Major classes of cell surface receptor.

Table 8.5

Major classes of cell surface receptor.

In terms of transcriptional activation, two general mechanisms permit rapid transmission of signals from cell-surface receptors to the nucleus, both involving protein phosphorylation:

  • protein kinases are activated and then translocated from the cytoplasm to the nucleus where they phosphorylate target transcription factors;
  • inactive transcription factors stored in the cytoplasm are activated by phosphorylation and translocated into the nucleus.

The following two sections provide examples to illustrate these two mechanisms (see also Karin and Hunter, 1995).

Hormonal signaling through the cyclic AMP pathway

Cyclic AMP is an important second messenger (see Table 8.6) which acts in response to a variety of hormones and other signaling molecules. It is synthesized from ATP by a membrane bound enzyme, adenylate cyclase. Hormones which activate adenylate cyclase bind to a cell surface receptor which is of the G protein-coupled receptor class. Binding of the hormone to the receptor promotes the interaction of the receptor with a G protein which consists of three subunits, α, β and γ. Following this interaction the α subunit of the G protein is activated, causing it to dissociate and stimulate adenylate cyclase.

Table 8.6. Examples of secondary messengers in cell signaling.

Table 8.6

Examples of secondary messengers in cell signaling.

The increase in intracellular cAMP produced by activated adenylate cyclase can then activate the transcription of specific target sequences that contain a cAMP response element or CRE. This function of cAMP is mediated by the enzyme protein kinase A. Cyclic AMP binds to protein kinase and activates it by permitting release of the two catalytically active subunits which then enter the nucleus and phosphorylate a specific transcription factor, CREB (CRE-binding protein). Activated CREB then activates transcription of genes with the cAMP response element (Figure 8.11B).

Figure 8.11. Selected target genes can be actively expressed in response to extracellular stimuli by signal transduction from activated cell surface receptors.

Figure 8.11

Selected target genes can be actively expressed in response to extracellular stimuli by signal transduction from activated cell surface receptors. (A) An example of activation of a protein kinase and translocation to the nucleus: hormonal signaling through (more...)

Activation of NF-κB via protein kinase C signaling

NF-κB is a transcription factor which is involved in a variety of aspects of the immune response. In its inactive state, NF-κB is retained in the cytoplasm where it is complexed with an inhibitory subunit, IκB. However, the latter can be targeted for degradation following phosphorylation by protein kinase C. The consequent destruction of IκB permits NF-κB to translocate to the nucleus and activate its various target genes. Protein kinase C is activated by diacylglycerol. The latter is produced when binding of various growth factors and hormones to specific cell surface receptors triggers activation of receptor-linked phospholipase C activity. The activated enzyme converts PIP2 (phosphatidylinositol 4,5 bisphosphate) to IP3 (inositol 1,4,5-trisphosphate) and diacylglycerol (Figure 8.11B).

8.2.5. Translational control of gene expression can involve specific recognition by RNA-binding proteins of regulatory sequences within the untranslated sequences of mRNA

Different forms of translational control of gene expression are evident and an increasing number of eukaryotic and mammalian mRNA species have been shown to contain regulatory sequences in their untranslated sequences (most frequently at the 3′ end; see Wickens et al., 1997; Day and Tuite, 1998). Several eukaryotic and mammalian RNA-binding proteins have also been identified and shown to bind to specific regulatory sequences present in untranslated sequences, thereby providing the basis for translational control of gene expression (Siomi and Dreyfuss, 1997). A variety of different RNA-binding domains have been identified and they include elements which have previously been associated with DNA-binding properties of transcription factors such as zinc fingers and homeodomains (see Section 8.2.3 and Siomi and Dreyfuss, 1997).

Intracellular RNA localization

The interaction between cis-acting regulatory elements in RNA and trans-acting RNA-binding proteins can be envisaged to alter RNA structure in various ways: facilitating or hindering interactions with other trans-acting factors; altering higher order RNA structure; bringing together initially remote RNA sequences; or providing localization or targeting signals for transport of RNA molecules to specific intracellular locations. In the latter case, numerous eukaryotic and mammalian mRNAs are known to be transported as RNP particles to specific locations within cells, and transport on microtubules and actin filaments has been demonstrated in some cases to require specific molecular motors (Hazelrigg, 1998). For example, tau mRNA is localized to the proximal portions of axons rather than to dendrites, where many mRNA molecules are located in mature neurons, and myelin basic protein mRNA is transported with the aid of kinesin to the processes of oligodendrocytes.

A rationale for such intracellular mRNA localization mechanisms is that they may provide a more efficient way to localize protein products than protein targeting: as detailed below, a single mRNA can give rise to many different protein molecules, assuming that it can engage with ribosomes. Different sequential steps have been envisaged: initial translational repression, transport within the cell, localization (to the specific subcellular destination) and then localization-dependent translation. Recently, key regulatory sequences which are required for various steps in this process have been identified in the untranslated sequences, predominantly the 3′ UTR, of many mRNA species (Hazelrigg, 1998). For example, two elements within the 3′ UTR of myelin basic protein mRNA are required for distinct steps in its transport to the processes of oligodendrocytes: a 21 nucleotide RNA transport sequence and a longer RNA localization region (Ainger et al., 1997).

Translational control of gene expression in response to external stimuli

Translational control of gene expression can permit a more rapid response to altered environmental stimuli than the alternative of activating transcription. Iron metabolism provides two useful examples. Increased iron levels stimulate the synthesis of the iron-binding protein, ferritin, without any corresponding increase in the amount of ferritin mRNA. Conversely, decreased iron levels stimulate the production of transferrin receptor (TfR) without any effect on the production of transferrin receptor mRNA. The 5′-UTR of both ferritin heavy chain mRNA and light chain mRNA contain a single iron-response element (IRE), a specific cis-acting regulatory sequence which forms a hairpin structure. Several such IRE sequences are also found in the 3′ UTR of the transferrin receptor mRNA (see Klausner et al., 1993). Regulation is exerted by binding of IREs by a specific IRE-binding protein which is activated at low iron levels (Figure 8.12).

Figure 8.12. The IRE-binding protein regulates the production of ferritin heavy chain and transferrin receptor by binding to iron-response elements (IREs) in the 5′- or 3′-untranslated regions.

Figure 8.12

The IRE-binding protein regulates the production of ferritin heavy chain and transferrin receptor by binding to iron-response elements (IREs) in the 5′- or 3′-untranslated regions. (A) Structure of the IRE in the 5′-UTR of the ferritin (more...)

Translational control of gene expression during early development

Gene expression during oocyte maturation and at the earliest embryonic stages is regulated at the level of translation, not transcription. Following fertilization of a human oocyte, no mRNA is made initially until the 4–8 cell stage when zygotic transcription is activated, that is, transcription of the genes present in the zygote. Before this time, cell functions are specified by maternal mRNA that was previously synthesized during oogenesis. While it is presently unclear to what extent the regulation of human gene expression parallels that of model organisms at this stage, extrapolation from the latter would suggest that a variety of mRNAs are stored in oocytes in an inactive form, characterized by having short oligo (A) tails. Such mRNAs were previously subject to deadenylation and the resulting short oligo (A) tail means that they cannot be translated. Subsequently, at fertilization or later in development, the stored inactive mRNA species can be activated by cytoplasmic polyadenylation, restoring the normal size poly (A) tail. Cytoplasmic polyadenylation appears to use the same type of poly (A) polymerase activity as in the standard polyadenylation of newly formed mRNA (which occurs in the nucleus). However, in addition to the AAUAAA signal, the mRNA needs to have a uridine-rich upstream cytoplasmic polyadenylation element (Wahle and Kuhn, 1997). Another mechanism that is used to regulate translation of some mRNAs during development is translational repression (masking) whereby RNA-binding proteins can recognize and bind specific sequences in the 3′ UTRs of the mRNAs, thereby repressing translation (see Wickens et al., 1997; Stebbins-Boaz and Richter, 1997).

8.3. Alternative transcription and processing of individual genes

In addition to the control that is exerted in selecting specific genes (or their transcripts) for activation or repression, control mechanisms can also select between specific alternative transcripts of a single gene. Differential promoter usage or differential RNA processing events can result in a large number of different isoforms and these and other mechanisms have challenged the classical definition of a gene (Box 8.3).

Figure 8.14. Differential splicing in the WT1 Wilm's tumor gene.

Figure 8.14

Differential splicing in the WT1 Wilm's tumor gene. The WT1 protein contains a transcriptional regulatory domain and a putative RNA-recognition motif at its N terminus. Four zinc fingers (Zn) are found at its C terminus, each encoded by a separate exon. Alternative (more...)

Box Icon

Box 8.3

The classical view of a gene is no longer valid. Classically, a gene has been viewed as an entity that encodes a single RNA or polypeptide product. By contrast, the concept of several different products being encoded by a single transcription unit (a (more...)

8.3.1. Transcription of a single human gene can be initiated from a variety of alternative promoters and can result in a variety of tissue-specific isoforms

Several human and mammalian genes are known to have two or more alternative promoters, which can result in different isoforms with different properties (see Ayoubi and van de Ven, 1996). The isoforms can provide:

  • tissue-specificity (a frequent occurrence; see the example of the human dystrophin gene below);
  • developmental stage-specificity (e.g. the insulin-like growth factor II gene);
  • differential subcellular localization (e.g. soluble and membrane-bound isoforms);
  • differential functional capacity (as in the case of the progesterone receptor);
  • sex-specific gene regulation (see the case of the Dnmt1 methyltransferase gene in Section 8.4.2 and Figure 8.20).
Figure 8.20. Sex-specific regulation of the Dnmt1 methyl transferase gene.

Figure 8.20

Sex-specific regulation of the Dnmt1 methyl transferase gene. The Dnmt1 methyltransferase gene appears to be the predominant maintenance DNA methyltransferase in mammalian cells and may be the major de novo methyltransferase too. It is highly expressed (more...)

One of the most celebrated examples of differential promoter usage in humans concerns the giant dystrophin (DMD) gene which comprises a total of more than 79 exons distributed over about 2.4 Mb of DNA in Xp21. At least eight different alternative promoters can be used (Cox and Kunkel, 1997). Four of the alternative promoters are located near the conventional start site and comprise a brain cortex-specific promoter, a muscle-specific promoter located 100 kb downstream, a promoter which is used in Purkinje cells of the cerebellum and located a further 100 kb downstream, and a lymphocyte-specific promoter (see Figure 8.13). Usage of these promoters results in large isoforms with a molecular weight of 427 kDa (referred to as Dp427 where Dp = Dystrophin protein and often given a suffix to indicate tissue specificity e.g. Dp427-M to indicate the muscle specific isoform). The four Dp427 isoforms differ in their extreme N-terminal amino acid sequence as a result of using four different alternatives for exon 1.

Figure 8.13. At least eight distinct promoters can be used to generate cell type-specific expression of the dystrophin gene.

Figure 8.13

At least eight distinct promoters can be used to generate cell type-specific expression of the dystrophin gene. The positions of the eight alternative promoters are illustrated at the top: L, lymphocyte; C, cortical; M, muscle; P, Purkinje; R, retinal; CNS, (more...)

In addition to the four alternative promoters encoding the conventional large isoforms, at least four other alternative internal promoters can be used. Transcription from these promoters uses only a downstream subset of the exons, resulting in significantly smaller isoforms: a Dp260 isoform produced in retinal cells; a Dp140 isoform produced by many cells in the brain and kidney; a Dp116 isoform produced in Schwann cells and a small Dp71 isoform produced in many cell types (see Figure 8.13). Note that the alternative usage of promoters enforces alternative use of exons but that alternative splicing events which are independent of differential promoter usage are also very common (see Section 8.3.2). In the case of dystrophin, for example, additional isoform complexity is introduced by alternative splicing, especially at the C terminus.

8.3.2. Human genes often encode more than one product as a result of alternative splicing and alternative polyadenylation events

In addition to differential use of promoters which can enforce alternative use of exons, a variety of alternative RNA processing events can also result in alternative isoforms. The primary mechanisms are alternative splicing events (that is, distinct from those induced by differential promoter usage), and alternative polyadenylation events. In many cases a combination of these mechanisms can result during the processing of a single gene. Together with the additional possibility of differential promoter usage, these mechanisms can result in very large numbers of isoforms for a single gene.

Alternative splicing

A large percentage of human genes undergo alternative splicing whereby different exon combinations are included in transcripts from the same gene during RNA processing. For many genes, numerous isoforms can be generated at the RNA level, but often the functional significance is poorly appreciated. In some genes, alternative splicing results in very considerable diversity in the untranslated regions. For example, in the liver alternative splicing results in at least 8 different 5′ UTR sequences for human growth hormone receptor mRNA (Pekhletsky et al., 1992), but the functional significance, if any, is not understood.

Alternative splicing of coding sequence exons is also common and some of the resulting protein isoforms have been shown to be tissue specific, so that individual exons present in one isoform but not in others may be termed ‘muscle-specific’ , ‘brain-specific’ etc. The different isoforms can provide a variety of possibilities for altered functional properties but detailed knowledge of the functional significance of the different isoforms is still comparatively sparse (see Box 8.4).

Box Icon

Box 8.4

Alternative splicing can alter the functional properties of a protein. The following, far from exhaustive, list is merely meant to illustrate some ways in which the biological properties of a protein can be altered as a result of alternative splicing. (more...)

The best understood model system for understanding the regulation of splicing is the sex determination pathway in Drosophila which also controls gene dosage. Alternative splicing is used in each branch of this pathway to control the expression of transcriptional regulators or chromatin-associated proteins that influence transcription, and both positive and negative control of splicing is evident (Lopez, 1998). In mammalian cells candidate splice regulators are the SR family of RNA-binding proteins (which have a distinctive C terminal domain rich in serine (S)-arginine (R) dipeptides] and some HnRNP (heterogeneous nuclear ribonucleoprotein particle) proteins. These proteins are known to promote various steps in assembly of spliceosomes and they are also known to bind to splicing enhancer sequences, regulatory sequences which can enhance splice site recognition (Lopez, 1998).

Alternative polyadenylation

The usage of alternative polyadenylation signals is also quite common in human mRNA, and different types of alternative polyadenylation have been identified (see Edwards-Gilbert et al., 1997). In many genes, two or more polyadenylation signals are found in the 3′ UTR and the alternatively polyadenylated transcripts can show tissue specificity; in other cases, alternative polyadenylation signals may be brought into play following alternative splicing. As an example of the latter, a combination of alternative splicing and alternative polyadenylation of the calcitonin gene (CALC) results in tissue-specific expression of two isoforms. Calcitonin, a circulating Ca2+ homeostatic hormone, is produced in the thyroid; the calcitonin gene-related peptide (CGRP), which may have both neuromodulatory and trophic activities, is synthesized in the hypothalamus (Figure 8.15).

Figure 8.15. Differential RNA processing results in tissue-specific products of the calcitonin gene.

Figure 8.15

Differential RNA processing results in tissue-specific products of the calcitonin gene. pA1 and pA2 represent alternative polyadenylation signals which are employed in thyroid and neural tissue respectively following alternative splicing. Exon 1 and the (more...)

8.3.3. RNA editing is a rare form of post-transcriptional processing whereby base-specific changes are enzymatically introduced at the RNA level

RNA editing is a form of post-transcriptional processing which can involve enzyme-mediated insertion or deletion of nucleotides or substitution of single nucleotides at the RNA level. Insertion or deletion RNA editing appears to be a peculiar property of gene expression in mitochondria of kinetoplastid protozoa and slime molds. Substitution RNA editing is frequently employed in some systems, such as the mitochondria and chloroplasts of vascular plants where individual mRNAs may undergo multiple C → U or U → C editing events, and has also been observed in a few mammalian genes (Ashkenas, 1997). At least four different classes of RNA editing are known to occur in human cells:

  • C [implies] U editing. Human APOB lipoprotein mRNA editing has been well-studied. In the liver the APOB gene encodes a 14.1 kb mRNA transcript and a 4536 amino acid product, apoB100. However, in the intestine the same gene encodes a 7 kb mRNA which contains a premature stop codon not present in the gene and encodes a product, apoB48, which is identical in sequence to the first 2152 amino acids of apoB100. A specific cytosine deaminase, Apobec1, converts a single cytosine at nucleotide 6666 in the intestinal APOB mRNA to uridine, thereby generating a stop codon (Figure 8.16).
  • A [implies] I editing. Genes encoding some ligand-gated ion channels including glutamate receptors and related proteins are subject to this type of mRNA editing. An adenosine is deaminated to give inosine (I), a base not normally present in mRNA (the amino group at carbon 6 of adenosine is replaced by a C=O carbonyl group). Inosine base pairs preferentially with cytosine and also interacts with ribosomes during translation as if it were a G. In the case of the glutamate receptor B gene, for example, RNA editing replaces a CAG (glutamine) codon by CIG which is translated as if it were the CGG codon (arginine). This type of editing which brings about a Gln [implies] Arg is often referred to as Q/R editing after the single letter code for the two amino acids involved.
  • Other classes of RNA editing. Two other documented forms of editing in human mRNA are the U [implies] C editing in mRNA from the WT1 Wilms' tumor gene and U [implies] A editing in α-galactosidase mRNA (Ashkenas, 1997).
Figure 8.16. Expression of the human apolipoprotein B gene in the intestine involves tissue-specific RNA editing.

Figure 8.16

Expression of the human apolipoprotein B gene in the intestine involves tissue-specific RNA editing. Note that codon 2153 specified by the CAA triplet at nucleotide positions 6666–6668 specifies glutamine in the ApoB100 product synthesized in (more...)

8.4. Asymmetry as a means of establishing differential gene expression and DNA methylation as means of perpetuating differential expression

The concept of tissue specificity of human gene expression is long established. What is much less clear is how such patterns get laid down initially. Since the DNA content of all nucleated cells in an organism is virtually identical, genetic mechanisms cannot explain how differential gene expression first develops in cells. To explain this, CH Waddington evoked epigenetic mechanisms of gene control during development. In recent times, a variety of epigenetic mechanisms have been identified, including ones which can perpetuate particular states of gene expression in somatic cell lineages.

8.4.1. Selective gene expression in cells of mammalian embryos most likely develops in response to short range cell-cell signaling events

In order to explain subsequent tissue-, cell- and developmental stage-specific patterns of expression, some mechanism is required to set up an asymmetry or axis in the fertilized egg cell or in very early development. In Drosophila, the egg is inherently asymmetrical because of transfer of gene products from asymmetrically sited nurse cells. The embryo develops initially as a multinucleate syncytium (effectively one big cell) and regionalization depends on the response of individual nuclei to long-range gradients of regulatory molecules. In mammals, however, the egg cell is relatively small and early embryonic development creates an apparently symmetrical aggregate of individual cells. Nevertheless, development becomes asymmetric.

The generation of asymmetry in mammalian cells could derive from early positional clues. Some aspects of early development are inherently asymmetrical including the point of entry of the sperm during fertilization, the attachment of the embryo to the uterine wall during implantation and the location of cells with respect to their neighbors. As the embryo develops into a ball of cells, and later on as more complex structures develop, individual cells will vary in the number of cell neighbors available. Short range intercellular signaling events (by direct cell-cell signaling or short-range intercellular signaling events) can provide a means of identifying cell position, and triggering differential gene expression. For example, if an intercellular signaling molecule has a range of, say, one cell diameter, then the cells at the outside of the blastula will receive different signals from those surrounded by neighbors on all sides, and the different positional cues may be translated into differential gene expression. As particular cell systems develop during, for example, organogenesis (mostly accomplished between the 4th and 9th embryonic weeks), particular cell type growth or differentiation factors may then induce the expression of developmental stage-specific and/or tissue-specific transcription factors.

8.4.2. Vertebrate DNA methylation is very largely confined to CpG dinucleotides and patterns of DNA methylation can be inherited when cells divide

Once differential expression patterns have been set up, epigenetic mechanisms can ensure that differential expression patterns are stably inherited when cells divide. DNA methylation is thought to play a major role in this respect, permitting the stable transmission from a diploid cell to daughter cells of chromatin states which repress gene expression. However, the precise function of DNA methylation in eukaryotes is still imperfectly understood and clearly shows species differences. Some organisms, for example, have no detectable DNA methlyation, as in the case of Drosophila, C. elegans and the yeast Saccharomyces cerevisiae. In those organisms where DNA methylation does occur, the patterns and functions of DNA methylation may differ.

Patterns of DNA methylation in vertebrates

The pattern of vertebrate DNA methylation differs from that in bacterial cells. In the latter, adenine and cytosine can both be methylated but in vertebrates methylation of DNA is restricted to cytosine residues. Only about 3% of the cytosines in human DNA are methylated, but most that are methylated are found in the CpG dinucleotide (that is, the methylated cytosines are almost always ones whose 3′ carbon atom is linked by a phosphodiester bond to the 5′ carbon atom of a guanine). In addition, a much smaller percentage of methylated cytosines occur within the sequence CpNpG.

Cytosine residues occurring in CpG dinucleotides in vertebrate DNA are targets for methylation by a specific cytosine methyltransferase. Methylation occurs at carbon atom 5 of the cytosine to generate 5-methylcytosine, which is chemically unstable and can spontaneously deaminate to give thymine (Figure 8.17A). Over long periods of evolutionary time, the number of CpGs in vertebrate DNA has gradually been eroded, although regions of the normal (expected) CpG frequency are known and often mark transcriptionally active sequences (CpG islands, see Box 8.5).

Figure 8.18. CpG island structure in three human genes.

Figure 8.18

CpG island structure in three human genes. Vertical bars represent the positions of the dinucleotide CpG in DNA sequences representing: (A) the human desmin (DES) gene; (B) the 5′ end of the human retinoblastoma (RB1) gene; and (C) the human apolipoprotein (more...)

Figure 8.17. The CpG dinucleotide is underrepresented in vertebrate DNA because it is prone to methylation and deaminated 5-methylcytosine is subject to ineffective DNA repair.

Figure 8.17

The CpG dinucleotide is underrepresented in vertebrate DNA because it is prone to methylation and deaminated 5-methylcytosine is subject to ineffective DNA repair. (A) Cytosine occurring in the sequence 5′-CpG-3′ is a target for methylation (more...)

Box Icon

Box 8.5

CpG islands. In vertebrate DNA the sequence CpG is a signal for methylation by a specific cytosine DNA methyltransferase, which adds a methyl group to the 5′ carbon of the cytosine. The resulting 5-methylcytosine is chemically unstable and is (more...)

Maintenance and de novo methylation during development

Unlike bacterial methylases, vertebrate cytosine methyltransferases show a strong preference for recognizing a hemi-methylated DNA target (i.e. one that is already methylated on one strand only). The sequence CpG shows dyad symmetry and so, following DNA replication, the newly synthesized DNA strands will receive the same CpG methylation pattern as the parental DNA (Figure 8.17B). As a result, the CpG methylation pattern can be stably transmitted to daughter cells. The perpetuation of a pre-existing methylation pattern is sometimes known as maintenance methylation and is carried out in mammalian cells by the product of the Dnmt1 gene.

The pattern of 5-methlycytosine distribution in the genome of differentiated somatic cells varies according to cell type but maintenance methylation ensures that methylation patterns in individual somatic cell lineages are quite stable. During gametogenesis and in the developing embryo, however, there are dramatic changes in methylation (Razin and Kafri, 1994). The genomes of the primordial germ cells of the embryo are not methylated to any extent. After gonadal differentiation and as the germ cells begin to develop, de novo methylation occurs leading to substantial methylation of the DNA of mammalian sperm and egg cells (Figure 8.19). The sperm genome is more heavily methylated than the egg's genome, and sex-specific differences in methylation patterns are found, notably at imprinted loci (see Mertineit et al., 1998 for references). The Dnmt1 methyltransferase gene, in addition to being the predominant maintenance DNA methyltransferase in mammalian cells may also be the major de novo methyltransferase. It is highly expressed in male germ cells, mature oocytes and in the early embryo. Dnmt1 gene expression has been shown to be subject to sex-specific regulation with oocyte- and spermatocyte-specific promoters introducing oocyte- and spermatocyte-specific exons leading to different gene products (Figure 8.20; Mertineit et al., 1998).

Figure 8.19. Changes in DNA methylation during mammalian development.

Figure 8.19

Changes in DNA methylation during mammalian development. Developmental stages for gametogenesis and early embryo development are expanded for clarity; those for later development are contracted, as indicated by double slashes. Note the very rapid changes (more...)

The genome of the fertilized oocyte is an aggregate of the sperm and egg genomes and so it and the very early embryo are substantially methylated with methylation differences at paternal and maternal alleles of many genes. Later on, at the morula and early blastula stages in the preimplantation embryo, genome-wide demethylation occurs (Figure 8.19). Later still, at the pregastrulation stage, there is widespread de novo methylation. However, the extent of this methylation varies in different cell lineages:

  • the somatic cell lineage is heavily methylated;
  • trophoblast-derived lineages which give rise to the placenta, yolk sac, etc., are undermethylated;
  • early primordial germ cells are spared; their genomic DNA remains very largely unmethylated until after gonadal differentiation and as the germ cells develop whereupon widespread de novo methylation occurs.

8.4.3. DNA methylation in animals has been thought to act as a form of host defense against transposons as well as a way of perpetuating patterns of transcriptional repression

Although not all eukaryotes appear to be subject to DNA methylation, its function in animal cells does appear to be critically important, and targeted mutagenesis of the cytosine methyltransferase gene in mice results in embryonic lethality. The precise function of DNA methylation in animal cells, however, remains unclear. Current views have focused in particular on two aspects of animal cells: the genome size (animals have comparatively large genomes with large numbers of genes, and also large numbers of highly repetitive DNA families belonging to the transposon class); and the mode of development (especially the variation in terms of lifespan and rate of cell turnover). Two quite contrasting views regarding the primary function of DNA methylation in animal cells have been the subject of much controversy: the host defense model and the gene regulation model.

Host defense as a primary function for DNA methylation

Like the restriction-modification function of DNA methylation in bacteria (see Box 4.1), the host-defense model envisages that the primary function of DNA methylation in animal cells is to confer a form of genome protection, but in this case checking the spread of transposons (Yoder et al., 1997). About one-third of the DNA sequence in the human genome can be classified as belonging to (retro)transposon families and a small fraction of these sequences in the human genome and other genomes is known to be actively transposing (see Section 7.4). Transposon families in the human and other genomes are known to be heavily methylated (about 90% of the 5-methylcytosines are thought to be located in retrotransposon families) and so DNA methylation has been viewed as a mechanism for repressing such transposition, which if left unchecked could be expected to be damaging to cells. However, recently obtained data from an invertebrate chordate, Ciona intestinalis, appear to be inconsistent with the genome defense model: multiple copies of an apparently active retrotransposon and a large fraction of highly repeated SINEs were predominantly unmethylated, while genes, by contrast, appeared to be methylated (Simmen et al., 1999).

Gene regulation as the primary function for DNA methylation

DNA methylation in vertebrates has been viewed as a mechanism for silencing transcription and may constitute a default position. DNA sequences which are transcriptionally active require to be unmethylated (at least at the promoter regions). While DNA methylation in invertebrates may serve to repress transposons and other repeated sequence families (see below), it may have acquired a special role in vertebrates as a mechanism for regulating expression of endogenous genes and reducing transcriptional noise (by silencing a large fraction of genes whose activity is not required in a cell). By reducing unnecessary gene expression, DNA methylation may have permitted the increase in gene number and in complexity that characterizes vertebrates (Bird, 1995). The counterargument is that the methylation status of the 5′ regions of tissue-specific genes cannot be correlated with expression in different tissues, and that the role of methylation in gene expression is in specialized biological functions resulting from mechanisms (e.g. imprinting) which use allele-specific gene expression (Walsh and Bestor, 1999).

DNA methylation and gene expression

The DNA of transcriptionally active and inactive chromatin differs in a number of features including the degree of compaction and the extent of its methylation (Table 8.7). While methylation of CpG islands downstream of promoters does not block continued transcription through these regions (Jones, 1999), there is no doubt that methylated promoter regions are correlated with transcriptional silencing. In addition, the extent of histone acetylation is an important factor. Specific histone acetyltransferases add acetyl groups to lysine residues close to the N terminus of histone proteins. The acetylated N termini then form tails that protrude from the nucleosome core. As the acetylated histones are thought to have a reduced affinity for the DNA, and possibly for each other, the chromatin may be able to adopt a more open structure that is more suited to gene expression. Deacetylation of the histones, however, promotes repression of gene expression presumably because the chromatin can become more condensed.

Table 8.7. Features associated with transcriptionally active and inactive chromatin.

Table 8.7

Features associated with transcriptionally active and inactive chromatin.

Recently, the processes of DNA methylation and histone deacetylation have been shown to be linked. Repression at methlyated CpG sequences in promoter regions appears to be mediated by proteins which specifically bind to methylated CpG. Two of these proteins have been identified, MeCP1 and MeCP2 (methylated CpG binding proteins 1 and 2), and the latter has been shown to be essential for embryonic development and to function as a transcriptional repressor. The ability of MeCP2 to repress gene expression has been shown to involve a histone deacetylase complex (Ng and Bird, 1999). One possible model envisages that an initial signal for transcriptional repression is the binding of MeCP2 to methylated CpG promoter sequences. The bound MeCP2 protein is then recognized by a complex consisting of a transcriptional repressor and a histone deacetylase which removes the acetyl groups from the N termini of the histones, so that the chromatin becomes more condensed (Figure 8.21).

Figure 8.21. Transcriptional repression by histone deacetylation may be mediated by DNA methylation.

Figure 8.21

Transcriptional repression by histone deacetylation may be mediated by DNA methylation. CpG dinucleotides are targets for DNA methylation and, in turn, methylated CpGs are targets for specific binding by proteins such as MeCP2, which acts as a transcriptional (more...)

8.5. Long-range control of gene expression and imprinting

8.5.1. Chromatin structure may exert long-range control over gene expression

A predominant theme in eukaryotic gene expression, distinguishing it from bacterial gene expression, has been that genes are individually transcribed. Promoters and related upstream elements typically control expression of a single gene with a transcription start point located within 1 kb of the control element. Some cis-acting elements, however, exert long-range control over a much larger chromosomal region and there is increasing evidence for coordinate regulation of gene clusters. Studies where genes are repositioned elsewhere in the genome have also suggested that chromosomes are organized into functional domains of gene expression (chromatin domains). For example, when genes are translocated to new chromosomal regions (either as a result of spontaneous chromosome breakage events, or by genetically manipulating model organisms - see Section 21.3), aberrant gene expression may often occur, even although the entire gene and the required control sequences in its immediate flanking sequences are preserved intact. Neighboring chromosome domains are envisaged to be separated by boundary elements (also called insulators) which act as barriers to the effects of distal enhancers and silencers (Geyer, 1997).

Competition for enhancers or silencers

Sometimes long-range control of gene expression appears to depend on competition between clustered genes for an enhancer. This appears to be a feature of globin gene expression as described in Section 8.5.2.

Heterochromatin-induced position effects

Studies of chromosomal rearrangements in Drosophila have shown that proximity to centromeres, telomeres or heterochromatic blocks may suppress gene expression, presumably by altering the structure of a large chromatin domain. Fascioscapulohumeral muscular dystrophy (FSHD; MIM 158900) is a possible example of a similar position effect in man. The gene for this autosomal dominant progressive neuromuscular disease maps close to the telomere of chromosome 4q. When Southern blots of EcoRI-digested DNA are hybridized to a subtelomeric probe p13H-11, a very large hybridizing fragment of over 30 kb is seen. DNA from FSHD patients consistently shows smaller bands of 14–28 kb. These patients have a reduced copy number of a 3.2 kb repetitive sequence that is recognized by the probe, and de novo deletion of stretches of the repeats have been observed by FSHD patients with no previous family history.

Hopes that the 3.2 kb sequence would contain the FSHD gene, however, have been disappointed. Even though it contains part of a homeodomain (Hewitt et al., 1994), there is no evidence that any part of it is transcribed or expressed. Most probably the FSHD gene is located proximally to the tandem 3.2 kb repeats, and the deletions move it closer to the 4q telomere, where it is silenced by a position effect.

Other position effects

Evidence of long-range effects controlling gene expression over large chromosomal domains has emerged from studies of disease-associated chromosome breakpoints in humans (Kleinjan and van Heyningen, 1998). Examples are aniridia (AN1; MIM 106210), which is caused by loss-of-function mutations of the PAX6 gene on 11p13, and campomelic dysplasia (CMPD1; MIM 211970), which is caused by mutations in the SOX9 gene on 17q24. In each case, affected patients are known who have clearly causative chromosomal breaks, but the breakpoints may be very distant (hundreds of kilobases) from the gene whose expression is affected and do not physically disrupt it. It seems likely that expression of the gene is suppressed by a long-range effect analogous to the classic position effects described above, reflecting the novel chromosomal environment created by the translocation.

Prader-Willi and Angelman syndromes (see Section 16.4.2) bring together position effects, imprinting and DNA methylation. A cis-acting sequence analogous to the globin locus control region has been identified which governs parent-specific methylation and gene expression of a megabase-size chromosomal region at 15q11.

X inactivation

X chromosome inactivation in mammals appears to be initiated by a single gene, XIST, which is uniquely expressed on the inactivated X chromosome (see Section 8.5.6). This effect is not understood but must be mediated by some sort of long-range chromatin structural change. This is so because a diffusible XIST-mediated agent would not be able to affect just the X chromosome on which the XIST gene is expressed.

8.5.2. Expression of individual genes in gene clusters may be coordinated by a common locus control region

Some human gene clusters show evidence of coordinated expression of the individual genes in the cluster. For example, individual genes in the α-globin, β-globin and the four HOX gene clusters are activated sequentially in a temporal sequence that corresponds exactly with their linear order on the chromosome. In the case of the globin genes, there is a clear developmental stage-specific expression: different genes can be active at the embryonic, fetal or adult stages to generate slightly different forms of hemoglobin (hemoglobin switching; Figure 8.22).

Figure 8.22. Human hemoglobin switching occurs at two distinct developmental stages.

Figure 8.22

Human hemoglobin switching occurs at two distinct developmental stages.

Recently, it has become apparent that the expression of the genes in each of the two human globin gene clusters is coordinated by a dominant control region, the locus control region (LCR) which is located some distance upstream of the gene cluster (see Grosveld et al., 1993). Such cluster-specific LCRs are thought to organize the cluster into an active chromatin domain and to act as enhancers of globin gene transcription. The open conformation of transcriptionally active chromatin domains makes them more accessible to cleavage by the enzyme DNase I. Consistent with this relationship, the β-globin LCR has been considered to comprise short sequences at three major erythroid-specific DNase I-hypersensitive sites (HS2, HS3 and HS4) clustered over a 15 kb region located about 50–60 kb upstream of the β-globin gene, while the α-globin LCR has been identified to occur at an erythroid-specific DNase-hypersensitive site, HS-40, located 60 kb upstream of the α-globin gene (Figure 8.23). Each site marks the location of what is effectively an enhancer sequence of about 200–300 bp of DNA which contains short cis-acting sequence elements, including multiple sequence elements recognized by erythroid-specific transcription factors (see Figure 8.6). Without the respective LCRs, globin gene expression is negligible and, in the case of the β-globin LCR, it appears that the HS2, HS3 and HS4 elements interact with each other to form a larger complex that interacts with the individual globin genes.

Figure 8.23. Gene expression in the α- and β-globin gene clusters is controlled by common locus control regions.

Figure 8.23

Gene expression in the α- and β-globin gene clusters is controlled by common locus control regions. (A) Organization of the human α- and β-globin gene clusters.The locus control regions (LCRs) consist of one or more erythroid-specific (more...)

Other DNase I-hypersensitive sites are located at the promoters of the globin genes, but show developmental stage specificity. For example, in fetal liver, the promoters of the two γ genes, the β and δ genes, are marked by DNase I-hypersensitive sites but, in adult bone marrow, the two γ genes are no longer transcriptionally active and their promoters no longer reveal DNase I-hypersensitive sites. Developmental stage-specific switching in globin gene expression is then thought to be accomplished by competition between the globin genes for interaction with their respective LCR and stage-specific activation of gene-specific silencer elements. For example, transcription of the ε-globin gene (HBE1) is preferentially stimulated by the neighboring LCR at the embryonic stage. In the fetus, however, ε-globin expression is suppressed following activation of a silencer and γ-globin expression becomes dominant (Figure 8.23).

In addition to the human globin LCRs a number of different additional LCRs have been identified (see Kioussis and Festenstein, 1997). However, the role of the human β-globin LCR (which has relied on analysis of human transgenes) has been challenged by gene targeting studies in mice which show that the β-globin LCR has a contributory function rather than a dominant one, and that it is not required for initiation of DNase sensitivity and expression of the genes in the mouse β-globin cluster. These contradictory findings may possibly mean that there is a functional difference between the human and murine LCRs (see Grosveld, 1999 for a review).

8.5.3. Some human genes show selective expression of only one of the two parental alleles

X-linked genes in females and all autosomal genes are biallelic because both father and mother normally contribute one allele each. In males possessing one X chromosome and one Y chromosome, the great majority of sex-linked genes are monoallelic: most of the many genes on the X do not have a functional homolog on the Y chromosome; and some of the few genes on the Y chromosome are known to be Y-specific, for example SRY, the major male sex-determining locus. A few genes on the Y chromosome do have functional homologs on the X chromosome and so are biallelic. In some cases of X-Y homologous loci, both homologs are normally functional (see Section 14.3.1 and Figure 14.9).

We are accustomed to assuming that both the paternal and maternal alleles of biallelic genes are expressed, unless one or both copies have sustained mutations which affect expression. Clearly the expression can be tissue-specific so that in some cells both parental alleles are strongly expressed; in others, both gene copies are not apparently expressed. Thus, although there may be cell type-specific differences in expression, there is no discrimination between the capacity of the two parental alleles to be expressed, other than that due to genetic (mutational) differences between them. However, in humans and other mammals, several biallelic genes are known where the expression of one parental allele, either the paternal or the maternal allele but not both, is normally repressed in some cells (allelic exclusion). In such cells the relevant gene is said to exhibit functional hemizygosity: only one half of the maximum gene product is normally obtained even although the sequences of both parental alleles are perfectly consistent with normal gene expression and may even be identical. In some cases the allelic exclusion may be a property of select cells or tissues while in other cells of the same individual both alleles may be expressed normally.

Although initially considered a rarity, monoallelic expression of biallelic genes has been demonstrated for a growing number of human genes. A variety of different expression mechanisms can be involved and two broad classes of mechanism are involved (see Chess, 1998; Ohlsson et al., 1998):

  • Allelic exclusion according to parent of origin (imprinting). In some cases, the choice of which of the two inherited copies is expressed is not random. This means that for some genes the allele whose expression is repressed is always the paternally inherited allele; in others it is always the maternally inherited allele (see Section 8.5.4).
  • Allelic exclusion independent of parent of origin. Here the decision as to which of the two alleles is repressed is initially made randomly, but afterwards that pattern of allelic exclusion is transmitted stably to daughter cells following cell division. A variety of different mechanisms may be involved (see Box 8.6). In some cases, complex gene regulation may be required. For example, olfactory receptor genes are found in large clusters or arrays in mammalian genes. In an individual olfactory neuron only one allelic array of olfactory receptor genes is active (see Chess, 1998). A unique form of control is the programed DNA rearrangements which are required for individual cell-specific expression of the immunoglobulin genes in B cells and the T cell receptor genes in T lymphocytes. Because of the complexity of the latter mechanisms they are discussed separately in Section 8.6.
Box Icon

Box 8.6

Mechanisms resulting in monoallelic expression from biallelic genes in human (mammalian) cells.

8.5.4. Genomic imprinting involves differences in the expression of alleles according to parent of origin

Various observations in mammals have suggested that the maternal and paternal genomes in an individual are not equivalent (Box 8.7). In addition to genetic differences between the DNA of the sperm and oocyte genomes, there are also epigenetic differences. A major difference is in both the total amount of DNA methylation (the sperm genome is more extensively methylated than the oocyte genome) and the pattern of DNA methylation in specific DNA sequence classes. For example, Line 1 sequences are highly methylated in sperm cells but only partially methylated in the oocyte (Razin and Kafri, 1994; Yoder et al., 1997). At some individual gene loci, too, there are major differences between the extent of methylation of paternal and maternal alleles. For example, the paternal allele of the H19 gene is heavily methylated; the maternal allele is undermethylated.

Box Icon

Box 8.7

The nonequivalence of the maternal and paternal genomes. In addition to the obvious X/Y sex chromosome difference, nonequivalence between paternally and maternally inherited autosomes and X chromosomes is indicated from the observations listed below. (more...)

As suggested by the observations in Box 8.7, differences between the paternal and maternal genomes lead to differences in expression between paternal and maternal alleles. Genomic imprinting (also called gametic or parental imprinting) in mammals describes the situation where there is nonequivalence in expression of alleles at certain gene loci, dependent on the parent of origin (Reik and Walter, 1998; Brannan and Bartolomei, 1999; Tilghman, 1999). In all (or at least some) of the tissues where the gene is expressed, the expression of either the paternally inherited allele or the maternally inherited allele, is consistently repressed, resulting in monoallelic expression. The same pattern of monoallelic expression can be faithfully transmitted to daughter cells following cell division. However, as the nucleotide sequence of the allele whose expression is repressed may be perfectly consistent with gene expression (and may even be identical to that of the expressed allele), this is an epigenetic phenomenon, not a genetic one.

Prevalence and evolution of imprinting

Most human genes are not subject to imprinting, otherwise we would not see so many simple mendelian characters. Systematic surveys have been made to identify imprinted chromosomal regions in the mouse. Unlike in humans, all mouse chromosomes are acrocentric and Robertsonian translocations can permit crosses to be set up which produce offspring having both copies of one particular chromosome derived from a single parent (uniparental disomy, UPD, see Section 2.6.4). These reveal that UPD for some chromosomes has no phenotypic effect; for others it produces abnormal phenotypes. The abnormal phenotypes are sometimes complementary for different parental origins, e.g. overgrowth is often seen in maternal UPD and growth retardation in paternal UPD. For some chromosomes, UPD is lethal.

Further dissection at the chromosomal and genetic levels shows that imprinting is a property of a limited number of individual genes or small chromosomal regions. Currently, a total of over 30 genes are known to be imprinted in humans and mice (electronic reference 1), but the list can be expected to grow. Thus far, two major clusters of imprinted genes are known in the human genome: a 1 Mb region at 11p15 (encompassing the Beckwith- Wiedemann region) contains at least seven imprinted genes which may be arranged in two clusters (Lee et al., 1999); a 2.3 Mb cluster at 15q11-q13 region (encompassing the Prader-Willi and Angelman syndrome loci) also contains at least seven imprinted genes (Schweizer et al., 1999; see Figure 8.24). The imprinted gene clusters contain examples of neighboring genes with different parental imprints e.g. the H19 gene is expressed only from the maternal chromosome 11 whilst the adjacent IGF2 gene is expressed only from the paternal chromosome.

Figure 8.24. Imprinted gene clusters in 11p15.5 and 15q11-q13.

Figure 8.24

Imprinted gene clusters in 11p15.5 and 15q11-q13. Genes not known to be imprinted are shown in black; imprinted genes are in blue. Genes shown as solid blue boxes, e.g. KCNQ1, UBE3A, show preferential repression of paternal alleles (so that in some tissues (more...)

The great majority of known imprinted genes are autosomal. However, the XIST gene which has a major role in establishing X chromosome inactivation (see next section) may be considered an example of an imprinted X-linked gene since expression of the maternally inherited allele is preferentially repressed in trophoblast. An imprinted X-linked gene which affects cognitive function has also been suggested from differential behavior patterns in Turner syndrome. Girls with Turner syndrome lack a Y chromosome but have only one X chromosome. If the X chromosome is inherited from the mother, socially disruptive behavior is common, but if inherited from the father, the girl shows behavior closer to normal for that of a girl (Skuse et al., 1997).

Imprinting is known to occur in seed plants, some insects and mammals. No major imprinting effect, as judged by phenotype, has been observed in some model organisms such as Drosophila, C. elegans and the zebrafish, although the potential for imprinting may exist in Drosophila. Mammals are unusual in the way in which embryos are totally dependent on flow of nutrients from the maternal placenta. As many imprinted genes are involved in regulating fetal growth, one explanation envisages conflict between the parental genomes: the paternal genome propagates itself best by creating an embryo which aggressively removes nutrients from the mother; the maternal genome suppresses this to protect the mother and spare some resources for future offspring. As seen in cases of uniparental diploidy (see Box 8.7), paternal genes are preferentially expressed in the trophoblast and extraembryonic membranes, while maternal genes are preferentially expressed in the embryo.

8.5.5. The mechanism of genomic imprinting is unclear but a key component appears to be DNA methylation

To confirm imprinting of a gene, it is necessary to identify an individual who is heterozygous for a sequence variant present in the mature mRNA. mRNA from different tissues can then be checked for monoallelic or biallelic expression, and the origin of each allele determined by typing the parents. For some genes, this type of analysis has shown that imprinting is confined to only certain tissues or to certain stages of development (Table 8.8). Thus imprinting allows an extra level of control of gene expression, but it is not possible to compress its functioning into a simple uniform story.

Table 8.8. Examples of tissue and developmental stage regulation of imprinted genes in mammals.

Table 8.8

Examples of tissue and developmental stage regulation of imprinted genes in mammals.

The above observations suggest that some mechanism must be able to distinguish between maternally and paternally inherited alleles: as chromosomes pass through the male and female germlines they must acquire some imprint to signal a difference between paternal and maternal alleles in the developing organism. A key component, at least in maintaining the imprinted status, is allele-specific DNA methylation (Brannan and Bartolomei, 1999; Tilghman, 1999). The imprinting of several imprinted genes has been shown to be disrupted in mutant mice that are deficient in the Dnmt1 cytosine methyltransferase gene and all imprinted genes are characterized by CG-rich regions of differential methylation.

Intriguingly, Dnmt1 is known to have sex-specific exons (Section 8.4.2 and Figure 8.20). In oocytes this results in an oocyte-specific amino-terminal truncated protein product which conceivably could specifically methylate the maternal alleles of genes such as the insulin-like growth factor II receptor. The spermatocyte-specific exon of Dnmt1 interferes with translation of Dnmt1 mRNA, and it is less clear how paternal-specific patterns of methylation could be acquired. During development, the imprint would be expected to be stably inherited at least for many rounds of DNA duplication (but see below). Clearly, there must also be a mechanism for erasing the imprints during transmission through the germline, as required when, for example, a man passes on an allele which he has inherited from his mother (Figure 8.25). Again, however, one can envisage the demethylation that occurs during the early embryo as one way of achieving this, leaving the primordial germ cells essentially unmethylated (see Figure 8.19).

Figure 8.25. Genomic (gametic) imprinting requires erasure of the imprint in the germline.

Figure 8.25

Genomic (gametic) imprinting requires erasure of the imprint in the germline. The diagram illustrates the fate of a chromosome carrying two genes, A and B, which are subject to imprinting: A is imprinted in the female germline, B is imprinted in the male (more...)

The timing of imprinting has become more clear recently. In the female germline, the maternal imprint, including the maternal pattern of methylation, is likely to be established during oocyte maturation which is consistent with the finding that the Dmnt1 protein is not detectable in nongrowing oocytes but is produced abundantly in growing oocytes. In the male germline, the functional paternal imprint is likely to be established prior to meiosis, possibly in the postmitotic primary spermatocyte (Brannan and Bartolomei, 1999).

Imprinted genes frequently reside in clusters with genes expressed on opposite chromosomes often located next to each other, and often containing genes which appear to encode a mature RNA (see Figure 8.24). Adjacent genes appear to be jointly regulated. In the case of the Prader-Willi/Angelman syndrome cluster on 15q11-q13, for example, a single region adjacent to the SNRPN gene, termed the imprinting center, is the dominant regulatory sequence and appears to act over comparatively large distances (see Figure 8.24, Brannan and Bartolomei, 1999; Tilghman, 1999). However, different mechanisms may be found in different imprinted clusters, or even within a single cluster. For example, the mouse H19, Igf2 and Ins2 genes are jointly regulated, sharing two endodermal-specific enhancers that are located 3′ to H19. But other genes in the cluster are not subject to this control, suggesting that multiple control mechanisms may occur within an imprinted gene cluster. Different imprinting mechanisms have been considered in relation to the H19/Igf2 regulation, including enhancer competition, but to explain a variety of contradictory findings, an imprinting center adjacent to H19 has been considered to function as a chromatin boundary (insulator) element (Tilghman, 1999).

8.5.6. X chromosome inactivation in mammals involves very long-range cis-acting repression of gene expression

Nature of X chromosome inactivation

X chromosome inactivation is a process that occurs in all mammals, resulting in selective inactivation of alleles on one of the two X chromosomes in females (Migeon, 1994; Lyon, 1999). It provides a mechanism of dosage compensation which overcomes sex differences in the expected ratio of autosomal gene dosage to X chromosome gene dosage (which is 2:1 in males but 1:1 in females). Males with a single X chromosome are constitutionally hemizygous for X chromosome genes, but females become functionally hemizygous by inactivating one of the parental X chromosome alleles (see also Section 2.2.3). Not all genes on the X chromosome are subject to inactivation; genes which escape X-inactivation include ones where there is a functional homolog on the Y chromosome, and some genes where gene dosage does not seem to be important (see Figure 14.12 for examples of genes which escape X-inactivation).

In rare individuals with an abnormal number of X chromosomes (45,X; 47,XXX; 47, XXY, etc), a single X chromosome remains active no matter how many are present. By contrast, in triploid individuals either one or two X chromosomes remain active and in tetraploids two X chromosomes remain active. Thus, there must be some kind of counting mechanism to ensure that one X chromosome remains active for every two sets of autosomes.

In mammals, both X chromosomes are active in the early female embryo. X-inactivation occurs at an early stage in development, being initiated at the late blastula stage in mice, and most likely also in humans. In each cell that will give rise to the female fetus, one of the two parental X chromosomes is randomly inactivated (note that trophoblast cells are an exception; the paternal X chromosome is preferentially inactivated, which is a classical example of tissue-restricted imprinting). After the paternal or maternal X chromosome is inactivated in a cell, the same X chromosome usually remains inactive in all progeny cells, that is the X chromosome inactivation pattern is clonally inherited (see Figure 2.6). This means that female mammals are mosaics, comprising mixtures of cell lines in which the paternal X is inactivated and cell lines where the maternally inherited X is inactivated. In addition to X chromosome inactivation in female somatic cells, the X chromosome is known to be inactivated transiently during gametogenesis in both males and females.

Mechanism of X chromosome inactivation

The process of X chromosome inactivation is complex, and distinct molecular mechanisms are involved in initiation of inactivation and maintenance of the inactivation. The X-inactivation center (Xic), which in humans is located at Xq13, controls the initiation and propagation of X-inactivation. At this centre, the XIST gene (called Xist in rodents) encodes a mature 15 kb RNA product which is uniquely encoded by the inactive X chromosome. XIST/Xist is therefore another example of a gene that is subject to monoallelic expression. In the cells of the early embryo, the decision regarding which X chromosome to inactivate is made randomly, and so the allelic exclusion which XIST/Xist shows in these cells is independent of parent of origin.

XIST/Xist is essential for Xic function in initating X chromosome inactivation but is not required for maintaining X chromosome inactivation. Somehow cis-limited spreading of this RNA product acts so as to coat the inactivated X chromosome over very long distances. In rodents, coating of the Xist RNA gives the inactivated X chromosome a banded pattern suggesting a preferential association with gene-rich Giemsa-minus regions. However, the mechanism of ensuring inactivation of genes on the inactive X but not on the active X is unknown (see Duthie et al., 1999 for possible models).

Although the Xist gene is essential for Xic function, Xist alone is not sufficient. The X-controlling element (Xce) affects the choice of which X chromosome remains active, and is distinct from Xist, being located 3′ to it. In addition, deletion of a 65-kb region 3′ to Xist produces an effect which suggests that elements involved in the counting mechanism lie distal and 3′ to Xist. Recently, another gene has been identified as being transcribed from the opposite strand to that which is used for transcribing the Xist gene. Because the transcription unit of the new gene completely overlaps the Xist gene, and is in the reverse orientation, it has been named Tsix (Lee et al., 1999). This has given rise to the idea that Xist may be regulated by the Tsix gene (see Heard et al., 1999 for possible models of Tsix regulation).

8.6. The unique organization and expression of Ig and TCR genes

The organization and expression of immunoglobulin (Ig) and T-cell recepter (TCR) genes is in many ways quite different from that of other genes. This is so because of the need for each individual to produce a huge variety of different Igs and TCRs. An individual B or T lymphocyte is monospecific and produces a single type of Ig or TCR; it is the population of different B and T cells in any one individual that enables the synthesis of so many different types of these molecules. B and T lymphocytes need to be extremely diverse because they represent the cells that provide antibody responses or cell-mediated responses to foreign antigen: by providing a large repertoire of Igs and TCRs, the possibilities for being able to recognize and bind very many different types of foreign antigen are greatly increased.

8.6.1. Ig and TCR genes exhibit a unique organization: multiple gene segments can encode each of several different regions of the polypeptide

Polypeptide structure

An Ig molecule consists of four polypeptide chains, two identical heavy chains and two identical light chains (see Figure 7.10). The light chains fall into two classes: kappa (κ) and lambda (λ) light chains, which are functionally equivalent. At the N-terminal segments of each type of chain are the so-called variable (V) regions, which need to bind foreign antigen; the remaining C-terminal segments are constant (C) regions. In the case of the heavy chains, there are different alternatives for the constant region which specify the tissues in which the Ig will be expressed and dictate the immunoglobulin class (Table 8.9). Similarly, TCRs, which provide cell-mediated immune responses to foreign antigens, consist of two types of chain. Each such chain has Ig-like variable regions which bind foreign antigen, and constant regions which anchor the molecule to the cell surface (see Figure 7.10). The most frequently occurring TCRs have a β and a γ chain; a minor population consists of an α chain and a δ chain.

Table 8.9. Ig classes and subclasses.

Table 8.9

Ig classes and subclasses.

Gene structure

The genes which encode the different types of chain in Igs and TCRs are located on different chromosomes and are organized as clusters of numerous gene segments (Table 8.10). Each such cluster is unusual in that the coding sequences for specific segments of each chain are often present in numerous different copies that are sequentially repeated. For example, although the constant region of human κ light chain Ig is encoded by a single Cκ sequence, the variable regions are encoded by a combination of a Vκ segment (which encodes most of the variable region) and a short Jκ segment (joining segment; encodes a small part at the C-terminal end of the variable region) which are selected from a total of about 76 alternative Vκ segments and five alternative Jκ segments. Although the λ light chain is similarly encoded by Vλ, Jγ and Cγ segments, the heavy chain Ig locus shows some differences. The variable region is encoded by a combination of a VH gene segment, a JH gene segment and also a DH gene segment (encoding a diversity segment), each selected from many repeated gene segments. Additionally, there are a variety of different CH sequences which specify the class of the Ig (see above). In total this cluster comprises about 140 gene segments, of which about one-third are known to be incapable of expression, and spans about 1200 kb (Figure 8.26).

Table 8.10. Functional human Ig and TCR loci.

Table 8.10

Functional human Ig and TCR loci.

Figure 8.26. The Ig heavy chain locus on 14q32 contains about 86 variable (V) region sequences, 30 diversity (D) segments, nine joining (J) segments and 11 constant region (C) sequences.

Figure 8.26

The Ig heavy chain locus on 14q32 contains about 86 variable (V) region sequences, 30 diversity (D) segments, nine joining (J) segments and 11 constant region (C) sequences. The entire locus spans about 1200 kb of 14q32.3 and, for clarification, is shown (more...)

As each Ig gene cluster or TCR gene cluster in an individual B or T lymphocyte only ever gives rise to at most one Ig or TCR polypeptide, an entire cluster can functionally be regarded as a single, albeit unusual, type of gene. However, the individual gene segments cannot be regarded as the functional equivalent of classical exons. This is so because individual gene segments in these clusters are sometimes composed of coding DNA and noncoding DNA and may consist of several exons. For example, each of the human CH sequences is itself composed of three or four classical exons separated by introns: after transcription into RNA, the intronic sequences are discarded, and only the exonic sequences are retained in the mRNA.

8.6.2. Programmed DNA rearrangements at the Ig and TCR loci occur during the maturation of B and T lymphocytes, respectively

The unique arrangement of gene segments in the Ig and TCR gene clusters reflects the very unusual way in which somatic recombinations are required in B and T lymphocytes before functional Ig and TCR genes can be assembled and then expressed (see below). Such somatic recombinations result in bringing together different combinations of the different gene segments in different individual lymphocytes. Consequently, they can be regarded as both tissue-specific (confined to B and T lymphocytes) and cell-specific events which involve alternative DNA splicing (as opposed to alternative RNA splicing which brings about different combinations of exons at the RNA level - Section 8.3.2). As a result, the original germline gene organization is altered: gene segments that were distant in the germline are spliced together at the DNA level. Because the choice of which of the many repeated gene segments are recombined to give a functional V-J or V-D-J unit is cell specific, individual B and T cells produce different Igs and TCRs. This means that, in a sense, every individual is a mosaic with respect to the organization of the Ig and TCR genes in B and T lymphocytes; even identical twins will diverge genetically.

The rearrangements which lead to the production of functional light chains and heavy chains of Igs are slightly different.

  • Making a light chain. In order to generate a functional κ light chain Ig, for example, a somatic recombination event brings together a specific combination of one of the Vκ gene segments and one of the Jκ gene segments (V-J joining). Thereafter, splicing to the single Cκ sequence occurs at the RNA level (Figure 8.27A).
  • Making a heavy chain. Two successive somatic recombinations are required, resulting first in DH-JH joining, and then VH-DH-JH joining. Subsequently the resulting VH-DH-JH coding sequence is spliced at the RNA level to the nearest CH sequence, initially Cμ (Figure 8.27B).
Figure 8.27. Igs are synthesized following somatic recombination of V and J, or V, D and J segments and subsequent RNA splicing to C sequences.

Figure 8.27

Igs are synthesized following somatic recombination of V and J, or V, D and J segments and subsequent RNA splicing to C sequences. (A) Light chain synthesis. Somatic recombination (DNA splicing) results in joining of a specific variable (V) segment to (more...)

Because there are three types of functional Ig gene loci in human cells (heavy chain, κ light chain and λ light chain), and because these occur on both maternal and paternal homologs, there are six chromosomal segments in which DNA rearrangments can result in production of an Ig chain. However, an individual B cell is monospecific: it produces only one type of Ig molecule with a single type of heavy chain and a single type of light chain. This is so for two reasons:

  • Allelic exclusion. A light chain or a heavy chain can be synthesized from a maternal chromosome or a paternal chromosome in any one B cell, but not from both parental homologs. As a result, there is monoallelic expression at the heavy chain gene locus in B cells. This phenomenon also applies to TCR gene clusters.
  • Light chain exclusion. A light chain synthesized in a single B cell may be a κ chain, or a λ chain, but never both. As a result of this requirement, plus that of allelic exclusion, there is monoallelic expression at one of the two functional light chain gene clusters and no expression at the other. The decision to choose which of the two heavy chain alleles and which of the four possible segments to make a light chain appears to be random. Most likely, in each B-cell precursor, productive DNA rearrangements are attempted at all six Ig alleles but the chances of productive arrangements in more than one light chain cluster or more than one heavy chain allele may not be high. Additionally, however, there appears to be some kind of negative feedback regulation: a functional rearrangement at one of the heavy chain alleles suppresses rearrangements occurring in the other allele, and a functional rearrangement at any one of the four regions capable of encoding a light chain suppresses rearrangements occurring in the other three.

8.6.3. V-J and V-D-J joining is often achieved by intrachromatid deletions, and also by megabase inversions in the former case

The genetic mechanism leading to V-J and V-D-J joining often involves large-scale deletions which are thought to occur by an intrachromatid recombination event, similar to those used in V-D-J-C joining (see next section). In addition, V-J joining often occurs as a result of megabase inversions. The human κ light chain gene locus spans about 1840 kb on 2p12 and includes about 76 Vκ segments, mostly comprising pairs of duplicated V gene segments, organized as two clusters: a proximal cluster located adjacent to the Jκ segments and to the Cκ segment, and a distal cluster. This occurs as a result of an inverted repeat structure: V gene segments in the proximal Vκ cluster usually have a corresponding duplicate in a distal Vκ cluster which is separated from the proximal cluster by about 800 kb and in the opposite orientation (Figure 8.28). Depending on which V segment cluster is involved, V-J joining occurs by two possible routes:

Figure 8.28. Inversion or deletion results in V-J splicing to produce functional Ig κ light chain genes.

Figure 8.28

Inversion or deletion results in V-J splicing to produce functional Ig κ light chain genes. The human κ light chain gene cluster contains about 76 Vκ segments arranged in two large clusters, in opposite orientations. V segments (more...)

  • V segments in the distal cluster are joined to J segments by inversions.
  • V segments in the proximal cluster are joined to J segments by deletions (Figure 8.28).

Note that the joining process is imprecise, and so can also introduce a measure of variability in the sequence at the junctions of joined segments.

8.6.4. Class switching of heavy chains involves differential joining of a single VDJ unit to alternative DNA segments encoding constant regions

Although a B cell produces only one type of Ig molecule, the heavy chain class (see Table 8.9) can change during the cell lineage (class switching or isotype switching). Such switching involves differential joining of the same VDJ unit that was brought together by two successive somatic recombinations (see Figure 8.27B) to different segments encoding alternative constant regions. The initial joining of a VDJ sequence to constant region segments is accomplished at the RNA level. However, subsequently, class switching involves joining the same VDJ unit at the DNA level to alternative constant regions by yet more somatic recombination events (V-D-J-C joining). Class switching involves the following progression:

  • initial synthesis of IgM only by immature B cells. This occurs because the VDJ unit is spliced at the RNA level to a Cμ sequence (Figure 8.27B).
  • Later synthesis of both IgM and IgD by immature B cells. The partial switch to making IgD occurs because the VDJ unit can be spliced at the RNA level to a Cδ sequence, as a result of alternative RNA splicing (Figure 8.29).
  • Synthesis of IgG, IgE or IgA by mature B cells. Class switching events involve splicing the same VDJ unit to a Cγ, Cε or Cα sequence, respectively, at the DNA level as a result of a somatic recombination event (VDJ-C joining). The mechanism involves deletion of the intervening sequence by intrachromatid recombination (Figure 8.29).
Figure 8.29. Ig heavy chain class switching is mediated by intrachromatid recombination.

Figure 8.29

Ig heavy chain class switching is mediated by intrachromatid recombination. Note that joining of the same VDJ unit to a Cμ or a Cδ sequence occurs at the level of RNA splicing to generate heavy chains for IgM and IgD respectively. In contrast, (more...)

Further reading

  1. Day DA, Tuite MF. Post-transcriptional gene regulatory mechanisms in eukaryotes: an overview. J. Endocrinol. (1998);157:361–371. [PubMed: 9691970]
  2. Gray NK, Wickens M. Control of translation initiation in animals. Annu. Rev. Cell Dev. Biol. (1998);14:399–458. [PubMed: 9891789]
  3. Latchman D (1998) Gene Regulation. A Eukaryotic Perspective. Stanley Thornes, Cheltenham.
  4. Roitt I, Brostoff J, Male D (1985) Immunology. Gower Medical Publishing, London.
  5. Russo E, Martienssen RA, Riggs AD (1996) Epigenetics and Mechanisms of Gene Regulation. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
  6. Travers A (1993) DNA-Protein Interactions. Chapman & Hall, London.
  7. van Driel R, Otte AP (1997) Nuclear Organization, Chromatin Structure and Gene Expression. Oxford University Press, Oxford.

Electronic references (e-Refs)

  1. http://www​


  1. Ainger K, Avossa D, Diana AS, Barry C, Barbarese E, Carson JH. Transport and localization elements in myelin basic protein mRNA. J. Cell Biol. (1997);138:1077–1087. [PMC free article: PMC2136761] [PubMed: 9281585]
  2. Antequara F, Bird A. Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA. (1993);90:11995–11999. [PMC free article: PMC48112] [PubMed: 7505451]
  3. Ashkenas J. Gene regulation by mRNA editing. Am. J. Hum. Genet. (1997);60:2378–2383.
  4. Ayoubi TA, Van De Ven WJ. Regulation of gene expression by alternative promoters. FASEB J. (1996);10:453–460. [PubMed: 8647344]
  5. Bestor TH. Cytosine methylation and the unequal developmental potentials of the oocyte and sperm genomes. Am. J. Hum. Genet. (1998);62:1269–1273. [PMC free article: PMC1377170] [PubMed: 9585619]
  6. Bird A. Gene number, noise reduction and biological complexity. Trends Genet. (1995);11: 94–100. [PubMed: 7732579]
  7. Blackwood EM, Kadonaga JT. Going the distance: a current view of enhancer action. Science. (1998);281:60–63. [PubMed: 9679020]
  8. Brannan CI, Bartolomei MS. Mechanisms of genomic imprinting. Curr. Opin. Genet. Dev. (1999);9:164–170. [PubMed: 10322141]
  9. Burke TW, Kadonaga JT. The downstream core promoter element, DPE, is conserved from Drosophila to humans and is recognized by TAFII60 of Drosophila. Genes Dev. (1997);11:3020–3031. [PMC free article: PMC316699] [PubMed: 9367984]
  10. Chess A. Expansion of the allelic exclusion principle. Science. (1998);279:2067–2068. [PubMed: 9537917]
  11. Cook GP, Tomlinson IM, Walter G, Riethman H, Carter NP, Buluwela L, Winter G, Rabbitts TH. A map of the immunoglobulin VH locus completed by analysis of the telomeric region of chromosome 14q. Nature Genet. (1994);7:162–168. [PubMed: 7920635]
  12. Cox GF, Kunkel LM. Dystrophies and heart disease. Curr. Opin. Cardiol. (1997);12:329–343. [PubMed: 9243091]
  13. Day DA, Tuite MF. Post-transcriptional gene regulatory mechanisms in eukaryotes. J. Endocrinol. (1998);157:361–371. [PubMed: 9691970]
  14. Duthie SM, Nesterova TB, Formstone EJ, Keohane AM, Turner BM, Zakian SM, Brockdorff N. XIST RNA exhibits a banded localization on the inactive X chromosome and is excluded from autosomal material in cis. Hum. Molec. Genet. (1999);8:195–204. [PubMed: 9931327]
  15. Edwards-Gilbert G, Veraldi KL, Milcarek C. Alternative poly (A) site selection in complex transcription units: means to an end? Nucleic Acids Res. (1997);13:2547–2561. [PMC free article: PMC146782]
  16. Geyer PK. The role of insulator elements in defining domains of gene expression. Curr. Opin. Genet. Dev. (1997);7:242. [PubMed: 9115431]
  17. Grosveld F, Dillon N, Higgs D. The regulation of human globin gene expression. Baillière's Clin. Haematol. (1993);6:31–55. [PubMed: 8353317]
  18. Grosveld F. Activation by locus control regions? Curr. Opin. Genet. Dev. (1999);9:152–157. [PubMed: 10322132]
  19. Hazelrigg T. The destinies and destinations of RNAs. Cell. (1998);95:451–460. [PubMed: 9827798]
  20. Heard E, Lovell-Badge R, Avner P. Anti-Xistentialism. Nature Genet. (1999);21:343–344. [PubMed: 10192375]
  21. Hewitt JE, Lyler R, Clark LN. et al. Analysis of the tandem repeat locus D4Z4 associated with fascioscapulohumeral muscular dystrophy. Hum. Molec. Genet. (1994);3:1287–1295. [PubMed: 7987304]
  22. Jiang Z-H, Zhang W-J, Rao Y, Wu JY. Regulation of Ich-1 pre-mRNA alternative splicing and apoptosis by mammalian splicing factors. Proc. Natl Acad. Sci. USA. (1998);95:9155–9160. [PMC free article: PMC21308] [PubMed: 9689050]
  23. Jones PA. The DNA methylation paradox. Trends Genet. (1999);15:34–37. [PubMed: 10087932]
  24. Karin M, Hunter T. Transcriptional control by protein phosphorylation: signal transmission from the cell surface to the nucleus. Curr. Biol. (1995);5:747–757. [PubMed: 7583121]
  25. Kioussis D, Festenstein R. Locus control regions: overcoming heterochromatin-induced gene inactivation in mammals. Curr. Opin. Genet. Dev. (1997);7:614–619. [PubMed: 9388777]
  26. Klausner RD, Rouault TA, Harford JB. Regulating the fate of mRNA: the control of cellular iron metabolism. Cell. (1993);72:19–28. [PubMed: 8380757]
  27. Kleinjan D-J, van Heyningen V. Position effect in human genetic disease. Hum. Mol. Genet. (1998);7:1611–1618. [PubMed: 9735382]
  28. Larsen F, Gundersen G, Lopez R, Prydz H. CpG islands as gene markers in the human genome. Genomics. (1992);13:1095–1107. [PubMed: 1505946]
  29. Larsson SH, Charlieu JP, Miyagawak K. et al. Subnuclear localization of WT1 in splicing or transcription factor domains is regulated by alternative splicing. Cell. (1995);81:391–401. [PubMed: 7736591]
  30. Lee JT, Davidow LS, Warshawsky D. Tsix, a gene antisense to Xist at the X-inactivation centre. Nature Genet. (1999);21:400–404. [PubMed: 10192391]
  31. Lee MP, Brandenburg S, Landes GM, Adams M, Miller G, Feinberg AP. Two novel genes in the center of the 11p15 imprinted domain escape genomic imprinting. Hum. Mol. Genet. (1999);8:683–690. [PubMed: 10072438]
  32. Lopez AJ. Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu. Rev. Genet. (1998);32:279–305. [PubMed: 9928482]
  33. Lyon MF. X-chromosome inactivation. Curr. Biol. (1999);9:R235–R237. [PubMed: 10209128]
  34. Mertineit C, Yoder JA, Taketo T, Laird DW, Trasier JM, Bestor TH. Sex-specific exons control DNA methyltransferase in mammalian germ cells. Development. (1998);125:889–897. [PubMed: 9449671]
  35. Migeon BR. X-chromosome inactivation: molecular mechanisms and genetic consequences. Trends Genet. (1994);10:230–235. [PubMed: 8091502]
  36. Ng H-H, Bird A. DNA methylation and chromatin modification. Curr. Opin. Genet. Dev. (1999);9:158–163. [PubMed: 10322130]
  37. Nikolov DB, Burley SK. RNA polymerase II transcription initiation: a structural view. Proc. Natl Acad. Sci. USA. (1997);94:15–22. [PMC free article: PMC33652] [PubMed: 8990153]
  38. Ogbourne S, Antalis TM. Transcriptional control and the role of silencers in transcriptional regulation in eukaryotes. Biochem J. (1998);331:1–14. [PMC free article: PMC1219314] [PubMed: 9512455]
  39. Ohlsson R, Tycko B, Sapienza C. Monoallelic expression: ‘there can only be one’ Trends Genet. (1998);14:435–438. [PubMed: 9825668]
  40. Pekhletsky RI, Chernov BK, Rubtsov PM. Variants of the 5′-untranslated sequence of human growth hormone receptor mRNA. Mol. Cell Endrocinol. (1992);90:103–109.
  41. Razin A, Kafri T. DNA methylation from embryo to adult. Prog. Nucleic Acid Res. Mol. Biol. (1994);48:53–81. [PubMed: 7938554]
  42. Reik W, Walter J. Imprinting mechanisms in mammals. Curr. Opin. Genet. Dev. (1998);8:154–164. [PubMed: 9610405]
  43. Schoenherr CJ, Paquette AJ, Anderson DJ. Identification of potential target genes for the neuron-restrictive silencer function. Proc. Natl. Acad. Sci. USA. (1996);93:9881–9886. [PMC free article: PMC38523] [PubMed: 8790425]
  44. Schweizer J, Zynger D, Francke U. In vivo nuclease hypersensitivity studies reveal multiple sites of parental origin-dependent chromatin conformation in the 150 kb SNRPN transcription unit. Hum. Molec. Genet. (1999);8:555–566. [PubMed: 10072422]
  45. Simmen MW, Leitgeb S, Charlton J, Jones SJM, Harris B, Clark VH, Bird AP. Nonmethylated transposable elements and methylated genes in a chordate genome. Science. (1999); 283:1164–1167. [PubMed: 10024242]
  46. Siomi H, Dreyfuss G. RNA-binding proteins as regulators of gene expression. Curr. Opin. Genet. Dev. (1997);7:345–353. [PubMed: 9229110]
  47. Skuse DH, James RS, Bishop DV. Evidence from Turner's syndrome of an imprinted X-linked locus affecting cognitive function. Nature. (1997);387:705–708. [PubMed: 9192895]
  48. Smale ST. Transcription initiation from TATA-less promoters within eukaryotic protein-coding genes. Biochim. Biophys. Acta. (1997);1351:73–88. [PubMed: 9116046]
  49. Stebbins-Boaz B, Richter JD. Translational control during early development. Crit. Rev. Eukaryot. Gene Expr. (1997);7:73–94. [PubMed: 9034716]
  50. Steinmetz EJ. Pre-mRNA processing and the CTD of RNA polymerase II: the tail that wags the dog? Cell. (1997);89:491–494. [PubMed: 9160740]
  51. Tilghman SM. The sins of the fathers and mothers: genomic imprinting in mammalian development. Cell. (1999);96:185–193. [PubMed: 9988214]
  52. Wahle E, Kuhn The mechanism of 3′ cleavage and polyadenylation of 3′ eukaryotic pre-mRNA. Prog. Nucleic Acid Res. Mol. Biol. (1997);57:41–71. [PubMed: 9175430]
  53. Walsh CP, Bestor TH. Cytosine methylation and mammalian development. Genes Dev. (1999);13:26–34. [PMC free article: PMC316374] [PubMed: 9887097]
  54. Weichhold GM, Klobeck H-G, Ohnheiser R, Combriato G, Zachau HG. Megabase inversions in the human genome as physiological events. Nature. (1990);347:90–92. [PubMed: 2118596]
  55. Wickens M, Anderson P, Jackson RJ. Life and death in the cytoplasm: messages from the 3′ end. Curr. Opin. Genet. Dev. (1997);7:220–232. [PubMed: 9115434]
  56. Yoder JA, Walsh CP, Bestor TH. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. (1997);13:335–340. [PubMed: 9260521]
Copyright © 1999, Garland Science.
Bookshelf ID: NBK7588