8.1. An overview of gene expression in human cells
The control mechanisms used to regulate human gene expression are fundamentally
similar to those found in other mammals, and generally resemble those in eukaryotes
in general. Although much more complex than equivalent mechanisms in organisms with
small genomes, many of the same basic principles apply and as in other eukaryotes, a
major level at which gene expression is controlled is the initiation of
transcription. Mammals are particularly complex multicellular organisms and so it is
perhaps unsurprising that there are some gene control mechanisms which are not used
in bacteria or in some other eukaryotes. Various regulation mechanisms are required
to maintain many different facets of mammalian gene expression, both at the spatial
and temporal levels (
Box 8.1).
Although simplistic, it is convenient to consider three broad levels at which gene
regulation can operate.
Transcriptional regulation of gene expression
We have long been accustomed to the idea that a primary control of gene regulation in
eukaryotes occurs at the level of initiation of transcription. Regulation of
expression can occur through the core promoter of a gene, at the level of
recruitment and processivity of the relevant RNA polymerase. Expression of genes is
initiated by the binding of transcription factors to the promoter. Basal levels of
transcription can be modulated by binding of protein factors to other regulatory
regions occurring in the sequences flanking the gene or sometimes within introns of
the gene.
Post-transcriptional regulation of gene expression
This category overlaps with the previous section since it includes mechanisms
operating at the level of RNA processing, such as RNA splicing which may more
accurately be considered as co-transcriptional rather than posttranscriptional
(Steinmetz, 1997). In addition to RNA
processing, other levels at which control of gene expression can be exerted include:
mRNA transport, translation, mRNA stability, protein processing, protein targeting,
protein stability, etc.
A surprising variety of mechanisms are employed at the level of RNA processing with
single alleles in an individual often able to generate a variety of different gene
products (isoforms). The occurrence of these and other mechanisms has
required a more flexible definition of the term gene than has
formerly been used (see below). Several mechanisms are involved in regulating gene
expression at the level of translation and an increasing number of regulatory
sequences have been identified in the 5′ and especially 3′
untranslated regions of mRNA. Control of gene expression at the level of protein
processing, targeting and stability has been shown in certain systems. For example,
activation of some peptide hormones such as insulin requires post-translational
cleavage from precursor forms (see Figure
1.23).
Epigenetic mechanisms and long range control of gene expression
In addition to genetic factors, additional factors which can be transmitted to
progeny cells following cell division but which are not directly
attributable to the DNA sequence are described as epigenetic. DNA methylation is an
epigenetic mechanism which plays an important part in mammalian gene control, acting
as a general method of maintaining repression of transcription. In addition, a
variety of other mechanisms which affect the chromatin environment of a gene and
hence its capacity for gene expression are known to operate in mammalian cells. In
some cases, the mechanisms ensure that within a cell only one of the two parentally
inherited alleles is normally expressed, even although the nucleotide sequence of
the allele which is not expressed may be identical to the one which is
expressed.
Table 8.1
Overview of the regulation of gene expression in human cells
Table 8.1 provides an overview of the
different types of mechanism known to be involved in regulating expression of human
genes.
8.2. Control of gene expression by binding of trans-acting protein
factors to cis-acting regulatory sequences in DNA and RNA
A common molecular basis for much of the control of gene expression (whether it
occurs at the level of initiation of transcription, RNA processing, translation or
RNA transport) is the binding of protein factors to regulatory nucleic acid
sequences. The latter can be DNA sequences found in the vicinity of the gene or even
within it, or RNA transcript sequences at the level of precursor RNA or mRNA. As the
protein factors engaged in regulating gene expression are themselves encoded by
distantly located genes, they are required to migrate to their site of action, and
so are called trans-acting factors. In contrast, the regulatory sequences to which
they bind to are on the same DNA or RNA molecule as the gene or RNA transcript that
is being regulated. Such sequences are said to be cis-acting.
Control by DNA-protein binding
A major control of gene expression in eukaryotic cells is exerted at the level of
initiation of transcription where three different types of RNA
polymerase are known to transcribe different classes of genes (see Table 1.3). All three types of RNA polymerase
are large enzymes, consisting of 8–14 subunits, and in each case the
polymerase is recruited to transcribe a gene following binding of proteins
(transcription factors) to specific regulatory DNA sequences within the gene or in
its vicinity. Chromatin is a highly organized and densely packed structure which
does not easily afford access to RNA polymerases and so transcription factors are
required to help activate it to give a more open structure that will enable
transcription to take place.
Control by RNA-protein binding
In addition to transcription factors, RNA-binding proteins are used to regulate gene
expression. The best-studied examples involve binding to regulatory sequences in the
untranslated sequences of mRNA, permitting translational control of gene expression.
In addition, specific RNA-protein binding interactions are expected to be involved
in the control of gene expression at the level of differential RNA processing too,
as in the case of binding of SR and HnRNP proteins to pre-mRNA in order to modulate
the choice of exons in splicing. The latter mechanisms are considered separately in
Section 8.3 to illustrate the tremendous
complexity of expression mechanisms that can be used to decode single genes, and the
significance of the large numbers of isoforms that can be produced as a result.
8.2.1. Ubiquitous transcription factors are required for transcription of RNA
polymerases I and III
RNA polymerases I and III in eukaryotic cells are dedicated to transcribing genes
to give RNA molecules (rRNA, tRNA, etc.) which assist in expression of the
polypeptide-encoding genes. The transcribed genes are housekeeping genes since
rRNA and tRNA are required in essentially all cells to assist in protein
synthesis. As a result, ubiquitous transcription factors are required to assist
RNA polymerases I and III.
Transcription by RNA polymerase I
Figure 8.1
.
The major human rRNA species are synthesized by cleavage from
a common 13 kb transcription unit which is part of a 40 kb
tandemly repeated unit
Small arrows indicated by letters A-D signify positions of
endonuclease cleavage of RNA precursors. Cleavage of the 41S
precursor at B generates two products: 20S + 32S. Following
cleavage of the 32S precursor at D, and excision of the small
5.8S rRNA, hydrogen bonding takes place between the 5.8S rRNA
and a complementary central segment of the 28S rRNA. The
approximately 6 kb of RNA sequence originating from the external
and internal transcribed spacer units (ETS, ITS1 and ITS2) are
degraded in the nucleus. S is the sedimentation coefficient, a
measure of size.
RNA polymerase I is confined to the nucleolus and is devoted to
transcription
of the 18S, 5.8S and 28S rRNA genes. The latter are consecutively organized
on a common 13 kb
transcription unit (). A compound unit of the 13-kb
transcription unit and an
adjacent 27 kb nontranscribed spacer is tandemly repeated about
50–60 times on the short arms of each of the five human
acrocentric chromosomes, at the
nucleolar organizer regions (see
Figure 2.18). The resulting five clusters of rRNA genes, each
about 2 Mb long, are referred to as ribosomal DNA or rDNA.
Figure 8.2
.
Initiation of transcription by RNA polymerase I
One possible model envisages initial binding of the two identical
subunits of the upstream binding factor to the upstream control
element and the core promoter element, and forcing these two
sequences to come into close proximity, enabling their
subsequent binding by the selectivity factor 1 (SL1) which
consists of four subunits. The stabilized structure permits
subsequent binding of other factors (not shown) and subsequently
RNA polymerase I.
Initiation of
transcription of the 28S, 5.8S and 18S rRNA genes is initiated
following binding of two
transcription factors to a core
promoter element at
the
transcription initiation site and an
upstream control element located
over 100 nucleotides
upstream. One of the
transcription factors, UBF
(
upstream binding factor), is a homodimer and its identical subunits may
bind first to the core
promoter element and
upstream control element,
bringing them together so that they can be bound by the second factor, SL1
(selectivity factor 1; known in mouse as TFI-1B; ). The bound
transcription factors
subsequently recruit RNA polymerase I to form an initiation complex.
The
primary transcript expressed from the single 13 kb
transcription unit is
a 45S precursor rRNA which undergoes a variety of cleavage reactions and
base-specific modifications (carried out by a large number of different
types of
small nucleolar RNA (
snoRNA)) to generate
the mature 28S, 5.8S and 18S rRNA species (see ). Thus, these genes differ from the vast
majority of nuclear genes, which are individually transcribed. Instead, rDNA
transcription resembles mtDNA
transcription (see
Section 7.1.1 and
Figure 7.2): both result in multigenic transcripts which yield
functionally related products. This unusual use of polygenic primary
transcripts is no different in principle, however, from the way in which a
single primary translation product is occasionally cleaved to generate two
or more functionally related polypeptides (see the example of human insulin
in
Figure 1.23).
Transcription by RNA polymerase III
Figure 8.3
.
tRNA and 5S rRNA genes have promoters located within the
coding sequence
(A) Positions of promoter elements in tRNA and 5S
rRNA genes. The promoter elements A and B in the tRNA genes are
located in the sequences specifying the D loop and the TwCG
loops respectively (see tRNA structure in Figure 1.7B). (B) Initiation
of transcription of a tRNA gene. Binding of the TFIIIC
transcription factor to the promoter elements permits subsequent
binding of the trimeric TFIIIB factor to the sequence
immediately upstream of the transcription start site. In
response to binding of the TFIIIB factor, RNA polymerase III
binds and initiates transcription. In the case of the 5S rRNA
genes, a similar mechanism occurs but in this case an additional
transcription factor TFIIIA is required to bind to the C box,
and the bound TFIIIA factor permits subsequent binding of TFIIIB
followed by recruitment of TFIIIC and RNA polymerase III as in
the case of tRNA genes.
RNA polymerase III is also involved in
transcription of a variety of
housekeeping genes, encoding various small stable RNA molecules such as 5S
rRNA, tRNA molecules, 7SL RNA and some of the
snRNA molecules needed for RNA
splicing. These genes are characterized by promoters that lie
within the coding sequence of the gene, rather than
upstream of it. In tRNA genes, the
promoter is bipartite, consisting of two
well conserved sequences, the A box and the B box, while in the 5S rRNA
gene, a single
promoter element is present, the C box. In each case,
transcription by RNA polymerase III is thought to proceed by binding of
ubiquitous
transcription factors to the
promoter elements, followed by
subsequent binding of other factors and finally recruitment of the
polymerase ().
8.2.2. Transcription of polypeptide-encoding genes often requires complex sets of
cis-acting transcriptional control sequences and tissue-specific transcription
factors
RNA polymerase II is responsible for transcribing all genes which encode
polypeptides and also certain species of snRNA gene. Like RNA polymerases I and
III, RNA polymerase II is dependent on auxiliary general transcription factors
(the usual nomenclature has a common prefix TF to denote transcription factor
followed by a Roman numeral to denote the associated RNA polymerase). In the
case of RNA polymerase II, there are a variety of auxiliary transcription
factors such as TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, etc, which can be
complex in structure (Nikolov and Burley,
1997). For example, the TATA box-binding protein (TBP), is only one
of the multiple protein subunits that make up TFIID and the associated proteins
are known as TBP-associated factors, or TAF proteins. The complex of polymerase
and general transcription factors is known as the basal transcription apparatus;
it constitutes all that is required to initiate transcription. Genes are
constitutively expressed at a minimum rate determined by the core promoter (see
below) unless the rate of transcription is increased or switched off by
additional positive or negative regulatory elements (which may be located some
distance away or by intrinsic components of the promoter
itself).
Some of the genes which encode polypeptides are housekeeping genes, but unlike
the products of genes transcribed by RNA polymerases I and III, a large
percentage of genes transcribed by RNA polymerase II show tissue-restricted or
tissue-specific expression patterns. Since the DNA in different nucleated cells
of an individual is essentially identical, the identity of a cell, whether it be
a hepatocyte or a T lymphocyte for instance, is defined by the proteins made by
the cell. In addition to general ubiquitous transcription factors, therefore,
tissue-specific or tissue-restricted transcription factors regulate the
expression of many genes which encode polypeptides, by recognizing and binding
specific cis-acting sequence elements.
Partly because of the large size of mammalian nuclear genomes and also because of
the general need for more sophisticated control systems imposed by having very
large numbers of interacting genes, control elements in eukaryotic cells are
quite elaborate. Often, regulation of expression of individual human genes is
controlled by several sets of
cis-acting regulatory elements.
While the individual regulatory elements may be composed of multiple short
sequence elements (typically 4–8 nucleotides long) distributed over a
few hundred base pairs, the different classes of regulatory element which
modulate the expression of a single gene may be located at considerable
distances. A variety of different types of
cis-acting elements
can be recognized, including promoters, enhancers, silencers,
boundary elements
(insulators) and
response elements (see
Box 8.2).
Figure 8.6
.
The HS-40 α-globin regulatory site contains many
recognition elements for erythroid-specific transcription
factors
Note that the HS-40 site appears to be a
locus control region for the
α-globin gene cluster (see Section 8.5.2).
Tissue specificity and developmental stage specificity of gene expression is
often conferred by
enhancer and
silencer sequences and a variety of
cis-acting sequences have been identified which are
specifically recognized by
tissue-specific
transcription factors. For example, specific expression in erythroid
cells is often signalled by one of two sequences: TGACTCAG (or its reverse
complement CTGAGTCA) which are recognized by the erythroid-specific
transcription factor NF-E2, or by the sequence
(A/T)
GATA(A/G) or its reverse complement which are
recognized by the GATA series of erythroid specific
transcription factors (see
for an example). Some
other examples of
cis-acting sequence elements recognized by
tissue-specific or
tissue-restricted
transcription factors are listed in
Table 8.2.
In addition to actively promoting tissue-specific transcription, some
cis-acting silencer elements confer tissue or developmental
stage specificity by blocking expression in all but the desired tissue. For
example, the neural restrictive silencer element (NRSE) represses expression of
several genes in all tissues other than neural tissues (Schoenherr et al., 1996). A
transcription factor that binds to the NRSE and which is variously called the
neural restrictive silencer factor (NRSF) or the RE-1 silencing transcription
factor (REST) is ubiquitously expressed in non-neural tissue and neuronal
precursors during early development but subsequently it is specifically
not expressed in more mature (postmitotic) neurons.
8.2.3. Transcription factors contain conserved structural motifs that permit DNA
binding
Transcription factors recognize and bind a short nucleotide sequence, usually as
a result of extensive complementarity between the surface of the protein and
surface features of the double helix in the region of binding. Although the
individual interactions between the amino acids and nucleotides are weak
(usually hydrogen bonds, ionic bonds and hydrophobic interactions), the region
of DNA-protein binding is typically characterized by about 20 such contacts,
which collectively ensure that the binding is strong and specific. In human and
other eukaryotic transcription factors, two distinct functions can often be
identified and located in different parts of the protein:
-
An activation domain. As the name suggests, this type of
domain functions in activating transcription of the target genes once
the transcription factor has bound to it. Activation domains are thought
to stimulate transcription by interacting with basal transcription
factors so as to assist the formation of the transcription complex on
the promoter. Although not so well-studied as DNA-binding domains, some
are known to be rich in aspartate and glutamate residues (acidic
activation domains); others are rich in proline or
glutamate.
-
A DNA-binding domain. This type of domain is necessary to
permit specific binding of the transcription factor to its target genes.
In contrast to activation domains, DNA-binding domains of transcription
factors have been well-studied. A number of conserved structural motifs
have been identified which are common to many different transcription
factors with quite different specificities, including the leucine
zipper, helix-loop-helix,
helix-turn-helix, and zinc finger motifs which are
described below. Each of the motifs uses α-helices (or
occasionally β-sheets; see Figure 1.24) to bind to the major groove of DNA. Clearly,
although the motifs in general provide the basis for DNA binding, the
precise collection of sequence elements in the DNA-binding domain will
provide the basis for the required sequence-specific recognition. Most
transcription factors bind to DNA as homodimers, with the DNA-binding
region of the protein usually distinct from the region responsible for
forming dimers.
The leucine zipper motif
Figure 8.7
.
Structural motifs commonly found in transcription factors and
DNA-binding proteins
Abbreviations: HTH, helix-turn-helix; HLH, helix-loop-helix.
Note that the leucine zipper monomer is
amphipathic [i.e. has hydrophobic residues
(leucines) consistently on one face of the helix]. Two such
helices can align with their hydrophobic faces in opposition to
form a coiled-coil structure.
Figure 8.8
.
Binding of conserved structural motifs in transcription
factors to the double helix
Note that the individual monomers of the
helix-loop-helix (HLH) dimer and the leucine zipper dimer are
colored differently to permit distinction, but may be identical
(homodimers). HLH heterodimers and leucine zipper heterodimers
may provide a higher level of regulation (see text).
The
leucine zipper is a helical stretch of amino acids rich in
leucine residues (typically occurring once every seven amino acid residues,
i.e. once every two turns of the helix - see ), which readily forms a dimer. Each monomer unit
consists of an amphipathic
a-helix (hydrophobic side groups
of the constituent amino acids face one way; polar groups face the other
way, see
Figure 1.24). The two
α-helices of the individual monomers join together over a short
distance to form a coiled-coil (see
Section
1.5.5) with the predominant interactions occurring between
opposed hydrophobic amino acids of the individual monomers. Beyond this
region the two α
-helices separate, so that the
overall dimer is a Y-shaped structure. The dimer is thought to grip the
double helix much like a clothes peg grips a clothes line (). In addition to forming
homodimers, leucine zipper proteins can occasionally form heterodimers
depending on the compatibility of the hydrophobic surfaces of the two
different monomers. Such heterodimer formation provides an important
combinatorial control mechanism in gene regulation.
The helix-loop-helix motif
The helix-loop-helix (HLH) motif is related to the leucine zipper and should
be distinguished from the helix- turn-helix (HTH) motif described in the
next section. It consists of two α-helices, one short and one
long, connected by a flexible loop. Unlike the short turn in the HTH motif,
the loop in the HLH motif is flexible enough to permit folding back so that
the two helices can pack against each other; that is, the two helices lie in
planes that are parallel to each other, in contrast to the two helices in
the HTH motif (). The HLH
motif mediates both DNA binding and protein dimer formation () and it permits occasional
heterodimer formation. In the latter case, however, heterodimers form
between a full-length HLH protein and a truncated HLH protein which lacks
the full length of the α-helix necessary to bind to the DNA. The
resulting heterodimer is unable to bind DNA tightly. As a result, HLH dimers
are thought to act as a control mechanism, by enabling inactivation of
specific gene regulatory proteins.
The helix-turn-helix motif
The HTH motif is a common motif found in homeoboxes, and a number of other
transcription factors. It consists of two short α-helices
separated by a short amino acid sequence which induces a turn, so that the
two α-helices are orientated differently (i.e. the two helices do
not lie in the same plane, unlike those in the HLH motif; ). The structure is very
similar to the DNA-binding motif of several bacteriophage regulatory
proteins such as the λ cro protein whose binding to DNA has been
intensively studied by X-ray crystallography. In the case of both the
λ cro protein and eukaryotic HTH motifs, it is thought that while
the HTH motif in general mediates DNA binding, the more C-terminal helix
acts as a specific recognition helix because it fits into the major groove
of the DNA (), controlling
the precise DNA sequence which is recognized.
The zinc finger motif
The
zinc finger motif involves binding a zinc ion by four conserved amino
acids to form a loop (finger), a structure which is often tandemly repeated.
Although several different forms exist, common forms involve binding of a
Zn
2+ ion by two conserved cysteine residues and two conserved
histidine residues, or by four conserved cysteine residues. The resulting
structure may then consist of an α-helix and a β-sheet
held together by coordination with the Zn
2+ ion, or of two
α-helices. In either case, the primary contact with the DNA is
made by an α-helix binding to the major groove. The so-called
Cys
2/His
2 finger typically comprises about 23
amino acids with neighboring fingers separated by a stretch of about seven
or eight amino acids ().
8.2.4. A variety of mechanisms permit transcriptional regulation of gene expression
in response to external stimuli
Table 8.3
Different modes of cell signaling
| Direct cell to cell signaling | A signal on the surface of one cell is bound by a
specific receptor on another cell | Membrane-anchored growth factors and their
receptors |
| Endocrine signaling | Hormones are secreted by specialized endocrine
cells and carried through the circulation to bind to receptors
in target cells at distant locations in the body | Release of glucocorticoid hormones by the adrenal
glands |
| Paracrine signaling | A molecule released from one cell acts locally to
affect nearby target cells | Neurotransmitters and receptors; nitric oxide in
the immune system, nervous system etc. |
| Autocrine signaling | A cell produces a signaling molecule to which it
also responds | T lymphocytes can respond to antigenic stimulation
by synthesizing factors that drive their own proliferation |
In eukaryotic cells, gene expression can be altered in a semipermanent way as
cells differentiate, or in a temporary, easily reversible way in response to
extracellular signals (inducible gene expression). Environmental cues such as
the extracellular concentrations of certain ions and small nutrient molecules,
temperature, shock, etc., can result in dramatic alteration of gene expression
patterns in cells exposed to changes in these parameters. In complex
multicellular animals there are also fundamental requirements for cells to
communicate with each other and different modes of cell signaling are possible
(
Table 8.3). In some cases,
alteration of gene expression is conducted at the translational level which can
offer certain advantages (
Section 8.2.5).
In other cases, gene expression is altered by modulating
transcription.
Transcriptional regulation in response to cell signaling can take different
forms, but the endpoint is always the same: a previously inactive
transcription
factor is specifically activated by the signaling pathway and then subsequently
binds to specific regulatory sequences located in the promoters of target genes,
thereby activating their
transcription. In the case of
transcription regulated
by signaling molecules or their intermediaries, such regulatory sequences are
often referred to as
response
elements (see
Table
8.4).
Ligand-inducible transcription factors
Small hydrophobic hormones and morphogens such as steroid hormones, thyroxine
and retinoic acid are able to diffuse through the plasma membrane of the
target cell and bind intracellular receptors in the cytoplasm or nucleus.
These receptors (often called hormone nuclear receptors) are
inducible transcription factors. Following binding of the homologous ligand,
the receptor protein associates with a specific DNA response element located
in the promoter regions of perhaps 50–100 target genes and
activates their transcription.
Figure 8.10
.
Transcriptional regulation by glucocorticoids
The glucocorticoid receptor is normally inactivated by being
bound to an inhibitor protein, Hsp90. Binding of glucocorticoids
to the glucocorticoid receptor releases Hsp90, the receptor
dimerizes and then activates selected genes which have a
glucocorticoid response element in their
promoter (see ).
Although thyroxine and retinoic acid are structurally and biosynthetically
unrelated to the steroid hormones, their receptors belong to a common
nuclear receptor superfamily. Two conserved domains characterize the family:
a centrally located
DNA binding domain of about 68 amino acids,
and a ~240 amino acid l
igand-binding domain located close to
the C terminus (). The DNA
binding domain contains zinc fingers and binds as a dimer with each monomer
recognizing one of two hexanucleotides in the response element. The two
hexanucleotides are either inverted repeats or direct repeats which are
typically separated by three or five nucleotides (). In the absence of the ligand, the
receptor is inactivated by direct repression of the DNA binding domain
function by the ligand-binding domain, or by binding to an inhibitory
protein, as in the case of the glucocorticoid receptor ().
Activation of transcription factors by signal transduction
Table 8.5
Major classes of cell surface receptor
| G protein-coupled | Activate heterotrimeric G-proteins (GTP-binding regulatory proteins). The latter consist
of three subunits, α, β and γ.
Upon activation, the α subunit translocates to
activate a target protein (see for an example) | Various molecules, including peptides,
hormones, e.g. epinephrine, etc. |
| Serine-threonine kinase | Activate intracellular proteins by
phosphorylating serine or threonine residues on target
proteins | Hormones, growth factors |
| Tyrosine kinase | Activate intracellular proteins by
phosphorylating tyrosine on target proteins, e.g. PDGF
receptor, insulin receptor, etc (see ) | Hormones, growth factors |
| Tyrosine kinase-associated | Receptors do not phosphorylate target proteins
directly, but instead rely on an associated tyrosine kinase.
Examples include cytokine receptors with associated JAK
(janus protein kinase) activity involved in JAK-STAT
signaling | Hormones, growth factors |
| Ion channel-linked | Involved in rapid synaptic signaling | Neurotransmitters |
Unlike lipid-soluble hormones or morphogens, hydrophilic signaling molecules
such as polypeptide hormones, cannot diffuse through the plasma membrane.
Instead, they bind to a specific receptor on the cell surface. After binding
of the ligand molecule, the receptor undergoes a conformational change and
becomes activated in such a way that it passes on the signal via other
molecules within the cell (signal transduction). Various classes of cell
surface receptor are known but many of them have a kinase activity or can
activate intracellular kinases (see
Table
8.5). Signal transduction pathways are often characterized by
complex regulatory interplay between kinases and phosphatases which can
activate or repress intermediates by phosphorylation/dephosphorylation. In
many cases, the phosphorylation or dephosphorylation induces an altered
conformation. In the case of activation of a signaling molecule, the altered
conformation often means that a signaling factor is no longer inhibited by
some repressor sequence present in an inhibitory protein to which it is
bound, or in a domain or sequence motif within its own structure.
In terms of transcriptional activation, two general mechanisms permit rapid
transmission of signals from cell-surface receptors to the nucleus, both
involving protein phosphorylation:
-
protein kinases are activated and then translocated from the
cytoplasm to the nucleus where they phosphorylate target
transcription factors;
-
inactive transcription factors stored in the cytoplasm are activated
by phosphorylation and translocated into the nucleus.
The following two sections provide examples to illustrate these two
mechanisms (see also Karin and Hunter,
1995).
Hormonal signaling through the cyclic AMP pathway
Table 8.6
Examples of secondary messengers in cell signaling
| Cyclic AMP (cAMP) | Produced from ATP by adenylate cyclase. Effects
are usually mediated through protein kinase A | Activation of CREB transcription factor (see
) |
| Cyclic GMP (cGMP) | Produced from GTP by guanylate cyclase. Best
characterized role is in visual reception in the vertebrate
eye | |
| Phospholipids/Ca2+ | Activated downstream of G protein-coupled
receptors and protein tyrosine kinases. Hydrolysis of
phosphatidylinositol 4,5-bis-phosphate (PIP2)
yields diacylglycerol and inositol 1,4,5-trisphosphate
(IP3) which activate protein kinase C and
mobilize Ca from intracellular stores | See |
Cyclic AMP is an important
second messenger (see
Table 8.6) which acts in response to
a variety of hormones and other signaling molecules. It is synthesized from
ATP by a membrane bound enzyme, adenylate cyclase. Hormones which activate
adenylate cyclase bind to a cell surface receptor which is of the G
protein-coupled receptor class. Binding of the hormone to the receptor
promotes the interaction of the receptor with a G protein which consists of
three subunits, α, β and γ. Following this
interaction the α subunit of the G protein is activated, causing
it to dissociate and stimulate adenylate cyclase.
The increase in intracellular cAMP produced by activated adenylate cyclase
can then activate the
transcription of specific target sequences that
contain a cAMP response element or CRE. This function of cAMP is mediated by
the enzyme protein kinase A. Cyclic AMP binds to protein kinase and
activates it by permitting release of the two catalytically active subunits
which then enter the nucleus and phosphorylate a specific
transcription
factor, CREB (CRE-binding protein). Activated CREB then activates
transcription of genes with the cAMP response element ().
Activation of NF-κB via protein kinase C signaling
NF-κ
B is a
transcription factor which is
involved in a variety of aspects of the immune response. In its inactive
state, NF-κB is retained in the cytoplasm where it is complexed
with an inhibitory subunit, IκB. However, the latter can be
targeted for degradation following phosphorylation by protein kinase C. The
consequent destruction of IκB permits NF-κB to
translocate to the nucleus and activate its various target genes. Protein
kinase C is activated by diacylglycerol. The latter is produced when binding
of various growth factors and hormones to specific cell surface receptors
triggers activation of receptor-linked phospholipase C activity. The
activated enzyme converts PIP
2 (phosphatidylinositol 4,5
bisphosphate) to IP
3 (inositol 1,4,5-trisphosphate) and
diacylglycerol ().
8.2.5. Translational control of gene expression can involve specific recognition by
RNA-binding proteins of regulatory sequences within the untranslated sequences
of mRNA
Different forms of translational control of gene expression are evident and an
increasing number of eukaryotic and mammalian mRNA species have been shown to
contain regulatory sequences in their untranslated sequences (most frequently at
the 3′ end; see Wickens et
al., 1997; Day and
Tuite, 1998). Several eukaryotic and mammalian RNA-binding proteins
have also been identified and shown to bind to specific regulatory sequences
present in untranslated sequences, thereby providing the basis for translational
control of gene expression (Siomi and
Dreyfuss, 1997). A variety of different RNA-binding domains have been
identified and they include elements which have previously been associated with
DNA-binding properties of transcription factors such as zinc fingers and
homeodomains (see Section 8.2.3 and Siomi and Dreyfuss, 1997).
Intracellular RNA localization
The interaction between cis-acting regulatory elements in
RNA and trans-acting RNA-binding proteins can be envisaged
to alter RNA structure in various ways: facilitating or hindering
interactions with other trans-acting factors; altering
higher order RNA structure; bringing together initially remote RNA
sequences; or providing localization or targeting signals for transport of
RNA molecules to specific intracellular locations. In the latter case,
numerous eukaryotic and mammalian mRNAs are known to be transported as RNP
particles to specific locations within cells, and transport on microtubules
and actin filaments has been demonstrated in some cases to require specific
molecular motors (Hazelrigg, 1998).
For example, tau mRNA is localized to the proximal portions of axons rather
than to dendrites, where many mRNA molecules are located in mature neurons,
and myelin basic protein mRNA is transported with the aid of kinesin to the
processes of oligodendrocytes.
A rationale for such intracellular mRNA localization mechanisms is that they
may provide a more efficient way to localize protein products than protein
targeting: as detailed below, a single mRNA can give rise to many different
protein molecules, assuming that it can engage with ribosomes. Different
sequential steps have been envisaged: initial translational repression,
transport within the cell, localization (to the specific subcellular
destination) and then localization-dependent translation. Recently, key
regulatory sequences which are required for various steps in this process
have been identified in the untranslated sequences, predominantly the
3′ UTR, of many mRNA species (Hazelrigg, 1998). For example, two elements within the
3′ UTR of myelin basic protein mRNA are required for distinct
steps in its transport to the processes of oligodendrocytes: a 21 nucleotide
RNA transport sequence and a longer RNA localization region (Ainger et al.,
1997).
Translational control of gene expression in response to external
stimuli
Translational control of gene expression can permit a more rapid response to
altered environmental stimuli than the alternative of activating
transcription. Iron metabolism provides two useful examples. Increased iron
levels stimulate the synthesis of the iron-binding protein, ferritin,
without any corresponding increase in the amount of ferritin mRNA.
Conversely, decreased iron levels stimulate the production of transferrin
receptor (TfR) without any effect on the production of transferrin receptor
mRNA. The 5′-UTR of both ferritin heavy chain mRNA and light chain
mRNA contain a single iron-response element (IRE), a specific
cis-acting regulatory sequence which forms a hairpin
structure. Several such IRE sequences are also found in the 3′ UTR
of the transferrin receptor mRNA (see
Klausner et al., 1993). Regulation is exerted
by binding of IREs by a specific IRE-binding protein which is activated at
low iron levels ().
Translational control of gene expression during early development
Gene expression during oocyte maturation and at the earliest embryonic stages
is regulated at the level of translation, not transcription. Following
fertilization of a human oocyte, no mRNA is made initially until the
4–8 cell stage when zygotic transcription is activated, that is,
transcription of the genes present in the zygote. Before this time, cell
functions are specified by maternal mRNA that was previously synthesized
during oogenesis. While it is presently unclear to what extent the
regulation of human gene expression parallels that of model organisms at
this stage, extrapolation from the latter would suggest that a variety of
mRNAs are stored in oocytes in an inactive form, characterized by having
short oligo (A) tails. Such mRNAs were previously subject to deadenylation
and the resulting short oligo (A) tail means that they cannot be translated.
Subsequently, at fertilization or later in development, the stored inactive
mRNA species can be activated by cytoplasmic polyadenylation, restoring the
normal size poly (A) tail. Cytoplasmic polyadenylation appears to use the
same type of poly (A) polymerase activity as in the standard polyadenylation
of newly formed mRNA (which occurs in the nucleus). However, in addition to
the AAUAAA signal, the mRNA needs to have a uridine-rich upstream
cytoplasmic polyadenylation element (Wahle and Kuhn, 1997). Another
mechanism that is used to regulate translation of some mRNAs during
development is translational repression (masking) whereby RNA-binding
proteins can recognize and bind specific sequences in the 3′ UTRs
of the mRNAs, thereby repressing translation (see Wickens et al., 1997; Stebbins-Boaz and Richter, 1997).
8.3. Alternative transcription and processing of individual genes
In addition to the control that is exerted in selecting specific genes (or their
transcripts) for activation or repression, control mechanisms can also select
between specific alternative transcripts of a single gene. Differential
promoter
usage or differential RNA processing events can result in a large number of
different isoforms and these and other mechanisms have challenged the classical
definition of a gene (
Box 8.3).
8.3.1. Transcription of a single human gene can be initiated from a variety of
alternative promoters and can result in a variety of tissue-specific
isoforms
Several human and mammalian genes are known to have two or more alternative
promoters, which can result in different isoforms with different properties (see
Ayoubi and van de Ven, 1996). The
isoforms can provide:
-
tissue-specificity (a frequent occurrence; see the example of the human
dystrophin gene below);
-
developmental stage-specificity (e.g. the insulin-like growth factor II
gene);
-
differential subcellular localization (e.g. soluble and membrane-bound
isoforms);
-
differential functional capacity (as in the case of the progesterone
receptor);
-
sex-specific gene regulation (see the case of the Dnmt1
methyltransferase gene in Section
8.4.2 and ).
One of the most celebrated examples of differential
promoter usage in humans
concerns the giant dystrophin (
DMD) gene which comprises a
total of more than 79 exons distributed over about 2.4 Mb of DNA in Xp21. At
least eight different alternative promoters can be used (
Cox and Kunkel, 1997). Four of the alternative promoters
are located near the conventional start site and comprise a brain
cortex-specific
promoter, a muscle-specific
promoter located 100 kb
downstream,
a
promoter which is used in Purkinje cells of the cerebellum and located a
further 100 kb
downstream, and a lymphocyte-specific
promoter (see ). Usage of these promoters
results in large isoforms with a molecular weight of 427 kDa (referred to as
Dp427 where Dp =
Dystrophin protein and often
given a suffix to indicate
tissue specificity e.g. Dp427-M to indicate the
muscle specific isoform). The four Dp427 isoforms
differ in their extreme N-terminal amino acid sequence as a result of using four
different alternatives for
exon 1.
In addition to the four alternative promoters encoding the conventional large
isoforms, at least four other alternative internal promoters can be used.
Transcription from these promoters uses only a
downstream subset of the exons,
resulting in significantly smaller isoforms: a Dp260 isoform produced in retinal
cells; a Dp140 isoform produced by many cells in the brain and kidney; a Dp116
isoform produced in Schwann cells and a small Dp71 isoform produced in many cell
types (see ). Note that the
alternative usage of promoters enforces alternative use of exons but that
alternative splicing events which are independent of differential
promoter usage
are also very common (see
Section 8.3.2).
In the case of dystrophin, for example, additional isoform complexity is
introduced by
alternative splicing, especially at the C terminus.
8.3.2. Human genes often encode more than one product as a result of alternative
splicing and alternative polyadenylation events
In addition to differential use of promoters which can enforce alternative use of
exons, a variety of alternative RNA processing events can also result in
alternative isoforms. The primary mechanisms are alternative splicing events
(that is, distinct from those induced by differential promoter usage), and
alternative polyadenylation events. In many cases a combination of these
mechanisms can result during the processing of a single gene. Together with the
additional possibility of differential promoter usage, these mechanisms can
result in very large numbers of isoforms for a single gene.
Alternative splicing
A large percentage of human genes undergo alternative splicing whereby different exon
combinations are included in transcripts from the same gene during RNA
processing. For many genes, numerous isoforms can be generated at the RNA
level, but often the functional significance is poorly appreciated. In some
genes, alternative splicing results in very considerable diversity in the
untranslated regions. For example, in the liver alternative splicing results
in at least 8 different 5′ UTR sequences for human growth hormone
receptor mRNA (Pekhletsky et
al., 1992), but the functional significance, if any,
is not understood.
Alternative splicing of coding sequence exons is also common and some of the
resulting protein isoforms have been shown to be
tissue specific, so that
individual exons present in one isoform but not in others may be termed
‘muscle-specific’ , ‘brain-specific’
etc. The different isoforms can provide a variety of possibilities for
altered functional properties but detailed knowledge of the functional
significance of the different isoforms is still comparatively sparse (see
Box 8.4).
The best understood model system for understanding the regulation of splicing
is the sex determination pathway in Drosophila which also
controls gene dosage. Alternative splicing is used in each branch of this
pathway to control the expression of transcriptional regulators or
chromatin-associated proteins that influence transcription, and both
positive and negative control of splicing is evident (Lopez, 1998). In mammalian cells candidate splice
regulators are the SR family of RNA-binding proteins (which have a
distinctive C terminal domain rich in serine (S)-arginine (R) dipeptides]
and some HnRNP (heterogeneous nuclear ribonucleoprotein particle) proteins.
These proteins are known to promote various steps in assembly of
spliceosomes and they are also known to bind to splicing enhancer sequences, regulatory sequences
which can enhance splice site recognition (Lopez, 1998).
Alternative polyadenylation
The usage of alternative
polyadenylation signals is also quite common in
human mRNA, and different types of alternative
polyadenylation have been
identified (see
Edwards-Gilbert et
al., 1997). In many genes, two or more
polyadenylation signals are found in the 3′ UTR and the
alternatively polyadenylated transcripts can show
tissue specificity; in
other cases, alternative
polyadenylation signals may be brought into play
following
alternative splicing. As an example of the latter, a combination
of
alternative splicing and alternative
polyadenylation of the calcitonin
gene (
CALC) results in
tissue-specific expression of two
isoforms. Calcitonin, a circulating Ca
2+ homeostatic hormone, is
produced in the thyroid; the calcitonin gene-related peptide (CGRP), which
may have both neuromodulatory and trophic activities, is synthesized in the
hypothalamus ().
8.3.3. RNA editing is a rare form of post-transcriptional processing whereby
base-specific changes are enzymatically introduced at the RNA level
RNA editing is a form of post-transcriptional processing which can involve
enzyme-mediated insertion or deletion of nucleotides or substitution of single
nucleotides at the RNA level. Insertion or deletion RNA editing
appears to be a peculiar property of gene expression in mitochondria of
kinetoplastid protozoa and slime molds. Substitution RNA editing is frequently
employed in some systems, such as the mitochondria and chloroplasts of vascular
plants where individual mRNAs may undergo multiple C → U or U
→ C editing events, and has also been observed in a few mammalian genes
(Ashkenas, 1997). At least four
different classes of RNA editing are known to occur in human cells:
-
C
U editing. Human
APOB lipoprotein mRNA editing has been
well-studied. In the liver the APOB gene encodes a 14.1
kb mRNA transcript and a 4536 amino acid product, apoB100. However, in
the intestine the same gene encodes a 7 kb mRNA which contains a
premature stop codon not present in the gene and encodes a product,
apoB48, which is identical in sequence to the first 2152 amino acids of
apoB100. A specific cytosine deaminase, Apobec1, converts a single
cytosine at nucleotide 6666 in the intestinal APOB mRNA
to uridine, thereby generating a stop codon (). -
A
I editing. Genes encoding
some ligand-gated ion channels including glutamate receptors and related
proteins are subject to this type of mRNA editing. An adenosine is
deaminated to give inosine (I), a base not normally
present in mRNA (the amino group at carbon 6 of adenosine is replaced by
a C=O carbonyl group). Inosine base pairs preferentially with
cytosine and also interacts with ribosomes during translation as if it
were a G. In the case of the glutamate receptor B gene, for example, RNA
editing replaces a CAG (glutamine) codon by CIG which is translated as
if it were the CGG codon (arginine). This type of editing which brings
about a Gln
Arg is often referred to as Q/R editing after the
single letter code for the two amino acids involved.
-
Other classes of RNA editing. Two other documented forms
of editing in human mRNA are the U
C editing in mRNA from the
WT1 Wilms' tumor gene and U
A editing in
α-galactosidase mRNA (Ashkenas, 1997).
8.4. Asymmetry as a means of establishing differential gene expression and DNA
methylation as means of perpetuating differential expression
The concept of tissue specificity of human gene expression is long established. What
is much less clear is how such patterns get laid down initially. Since the DNA
content of all nucleated cells in an organism is virtually identical, genetic
mechanisms cannot explain how differential gene expression first develops in cells.
To explain this, CH Waddington evoked epigenetic mechanisms of gene
control during development. In recent times, a variety of epigenetic mechanisms have
been identified, including ones which can perpetuate particular states of gene
expression in somatic cell lineages.
8.4.1. Selective gene expression in cells of mammalian embryos most likely develops
in response to short range cell-cell signaling events
In order to explain subsequent tissue-, cell- and developmental stage-specific
patterns of expression, some mechanism is required to set up an asymmetry or
axis in the fertilized egg cell or in very early development. In
Drosophila, the egg is inherently asymmetrical because of
transfer of gene products from asymmetrically sited nurse cells. The embryo
develops initially as a multinucleate syncytium (effectively one big cell) and
regionalization depends on the response of individual nuclei to long-range
gradients of regulatory molecules. In mammals, however, the egg cell is
relatively small and early embryonic development creates an apparently
symmetrical aggregate of individual cells. Nevertheless, development becomes
asymmetric.
The generation of asymmetry in mammalian cells could derive from early positional
clues. Some aspects of early development are inherently asymmetrical including
the point of entry of the sperm during fertilization, the attachment of the
embryo to the uterine wall during implantation and the location of cells with
respect to their neighbors. As the embryo develops into a ball of cells, and
later on as more complex structures develop, individual cells will vary in the
number of cell neighbors available. Short range intercellular signaling events
(by direct cell-cell signaling or short-range intercellular signaling events)
can provide a means of identifying cell position, and triggering differential
gene expression. For example, if an intercellular signaling molecule has a range
of, say, one cell diameter, then the cells at the outside of the blastula will
receive different signals from those surrounded by neighbors on all sides, and
the different positional cues may be translated into differential gene
expression. As particular cell systems develop during, for example,
organogenesis (mostly accomplished between the 4th and 9th embryonic weeks),
particular cell type growth or differentiation factors may then induce the
expression of developmental stage-specific and/or tissue-specific transcription
factors.
8.4.2. Vertebrate DNA methylation is very largely confined to CpG dinucleotides and
patterns of DNA methylation can be inherited when cells divide
Once differential expression patterns have been set up, epigenetic mechanisms can
ensure that differential expression patterns are stably inherited when cells
divide. DNA methylation is thought to play a major role in this respect,
permitting the stable transmission from a diploid cell to daughter cells of
chromatin states which repress gene expression. However, the precise function of
DNA methylation in eukaryotes is still imperfectly understood and clearly shows
species differences. Some organisms, for example, have no detectable DNA
methlyation, as in the case of Drosophila, C. elegans and the
yeast Saccharomyces cerevisiae. In those organisms where DNA
methylation does occur, the patterns and functions of DNA methylation may
differ.
Patterns of DNA methylation in vertebrates
The pattern of vertebrate DNA methylation differs from that in bacterial
cells. In the latter, adenine and cytosine can both be methylated but in
vertebrates methylation of DNA is restricted to cytosine residues. Only
about 3% of the cytosines in human DNA are methylated, but most that are
methylated are found in the CpG dinucleotide (that is, the methylated
cytosines are almost always ones whose 3′ carbon atom is linked by
a phosphodiester bond to the 5′ carbon atom of a guanine). In
addition, a much smaller percentage of methylated cytosines occur within the
sequence CpNpG.
Figure 8.17
.
The CpG dinucleotide is underrepresented in vertebrate DNA
because it is prone to methylation and deaminated
5-methylcytosine is subject to ineffective DNA repair
(A) Cytosine occurring in the sequence
5′-CpG-3′ is a target for methylation at the
5′ carbon atom. The deaminated products of cytosine
and its methylated derivative 5-methylcytosine are
differentially recognized by DNA repair enzymes.
(B) The methylation pattern of CpGs is perpetuated
by a requirement for the specific methylase to recognize a
hemimethylated target sequence. The sequence
CpG has dyad symmetry. Following methylation of a hemimethylated
target (i.e. methylated on one strand only), the two methylated
strands will separate at DNA duplication and act as templates
for the synthesis of two unmethylated daughter strands. The
resulting daughter duplexes will now provide new hemimethylated
targets for continuing the same pattern of methylation.
(C) Deamination of 5-methylcytosine in the
sequence CpG results in conversion of CpG dinucleotides to TpG
and CpA dinucleotides.
Cytosine residues occurring in CpG dinucleotides in vertebrate DNA are
targets for methylation by a specific cytosine methyltransferase.
Methylation occurs at carbon atom 5 of the cytosine to generate
5-methylcytosine, which is chemically unstable and can spontaneously
deaminate to give thymine (). Over long periods of evolutionary time, the number of
CpGs in vertebrate DNA has gradually been eroded, although regions of the
normal (expected) CpG frequency are known and often mark transcriptionally
active sequences (
CpG islands,
see
Box 8.5).
Maintenance and de novo methylation during development
Unlike bacterial methylases, vertebrate cytosine methyltransferases show a
strong preference for recognizing a
hemi-methylated DNA
target (i.e. one that is already methylated on one strand only). The
sequence CpG shows dyad symmetry and so, following DNA replication, the
newly synthesized DNA strands will receive the same CpG methylation pattern
as the parental DNA ().
As a result, the CpG methylation pattern can be stably transmitted to
daughter cells. The perpetuation of a pre-existing methylation pattern is
sometimes known as maintenance methylation and is carried out in mammalian
cells by the product of the
Dnmt1 gene.
Figure 8.19
.
Changes in DNA methylation during mammalian
development
Developmental stages for gametogenesis and early embryo
development are expanded for clarity; those for later
development are contracted, as indicated by double slashes. Note
the very rapid changes in DNA methylation during: (i)
gametogenesis - de novo
methylation gives rise to substantially methylated genomes in
the sperm and egg (albeit with differences in both the overall
level of methylation and the pattern of methylation in these
genomes - see text), and in (ii) the
early embryo where a wave of genome-wide
demethylation occurs at the preimplantation stage (morula and
early blastula), and is succeeded shortly afterwards by
large-scale de novo methylation beginning at
the pregastrulation stage. The latter is particularly pronounced
in somatic lineages, and to a lesser extent in trophoblast
lineages giving rise to placenta and yolk sac, but does not
occur in the primordial germ cells (the cells of the embryo
which will eventually give rise to sperm and egg cells).
The pattern of 5-methlycytosine distribution in the
genome of differentiated
somatic cells varies according to cell type but maintenance methylation
ensures that methylation patterns in individual
somatic cell lineages are
quite stable. During gametogenesis and in the developing embryo, however,
there are dramatic changes in methylation (
Razin and Kafri, 1994). The genomes of the
primordial germ cells
of the embryo are not methylated to any extent. After gonadal
differentiation and as the germ cells begin to develop,
de
novo methylation occurs leading to substantial methylation of
the DNA of mammalian sperm and egg cells ()
. The sperm
genome is more heavily
methylated than the egg's
genome, and
sex-specific differences in
methylation patterns are found, notably at imprinted loci (see
Mertineit et al.,
1998 for references). The
Dnmt1
methyltransferase gene, in addition to being the predominant maintenance DNA
methyltransferase in mammalian cells may also be the major
de
novo methyltransferase. It is highly expressed in male germ
cells, mature oocytes and in the early embryo.
Dnmt1 gene
expression has been shown to be subject to sex-specific regulation with
oocyte- and spermatocyte-specific promoters introducing oocyte- and
spermatocyte-specific exons leading to different gene products (;
Mertineit et al., 1998).
The
genome of the fertilized oocyte is an aggregate of the sperm and egg
genomes and so it and the very early embryo are substantially methylated
with methylation differences at paternal and maternal alleles of many genes.
Later on, at the morula and early blastula stages in the preimplantation
embryo,
genome-wide demethylation occurs (
). Later still, at the
pregastrulation stage, there is widespread
de novo
methylation. However, the extent of this methylation varies in different
cell lineages:
-
the somatic cell lineage is heavily methylated;
-
trophoblast-derived lineages which give rise to the
placenta, yolk sac, etc., are undermethylated;
-
early primordial germ cells are spared; their
genomic DNA remains very largely unmethylated until after gonadal
differentiation and as the germ cells develop whereupon widespread
de novo methylation
occurs.
8.4.3. DNA methylation in animals has been thought to act as a form of host defense
against transposons as well as a way of perpetuating patterns of transcriptional
repression
Although not all eukaryotes appear to be subject to DNA methylation, its function
in animal cells does appear to be critically important, and targeted mutagenesis
of the cytosine methyltransferase gene in mice results in embryonic lethality.
The precise function of DNA methylation in animal cells, however, remains
unclear. Current views have focused in particular on two aspects of animal
cells: the genome size (animals have comparatively large genomes with large
numbers of genes, and also large numbers of highly repetitive DNA families
belonging to the transposon class); and the mode of development (especially the
variation in terms of lifespan and rate of cell turnover). Two quite contrasting
views regarding the primary function of DNA methylation in animal cells have
been the subject of much controversy: the host defense model and
the gene regulation model.
Host defense as a primary function for DNA methylation
Like the restriction-modification function of DNA methylation in bacteria
(see Box 4.1), the
host-defense model envisages that the primary function of DNA methylation in
animal cells is to confer a form of genome protection, but in this case
checking the spread of transposons (Yoder
et al., 1997). About one-third of the DNA
sequence in the human genome can be classified as belonging to
(retro)transposon families and a small fraction of these sequences in the
human genome and other genomes is known to be actively transposing (see
Section 7.4). Transposon families
in the human and other genomes are known to be heavily methylated (about 90%
of the 5-methylcytosines are thought to be located in retrotransposon
families) and so DNA methylation has been viewed as a mechanism for
repressing such transposition, which if left unchecked could be expected to
be damaging to cells. However, recently obtained data from an invertebrate
chordate, Ciona intestinalis, appear to be inconsistent
with the genome defense model: multiple copies of an apparently active
retrotransposon and a large fraction of highly repeated SINEs were
predominantly unmethylated, while genes, by contrast, appeared to be
methylated (Simmen et
al., 1999).
Gene regulation as the primary function for DNA methylation
DNA methylation in vertebrates has been viewed as a mechanism for silencing
transcription and may constitute a default position. DNA sequences which are
transcriptionally active require to be unmethylated (at least at the
promoter regions). While DNA methylation in invertebrates may serve to
repress transposons and other repeated sequence families (see below), it may
have acquired a special role in vertebrates as a mechanism for regulating
expression of endogenous genes and reducing transcriptional noise (by
silencing a large fraction of genes whose activity is not required in a
cell). By reducing unnecessary gene expression, DNA methylation may have
permitted the increase in gene number and in complexity that characterizes
vertebrates (Bird, 1995). The
counterargument is that the methylation status of the 5′ regions
of tissue-specific genes cannot be correlated with expression in different
tissues, and that the role of methylation in gene expression is in
specialized biological functions resulting from mechanisms (e.g. imprinting)
which use allele-specific gene expression (Walsh and Bestor, 1999).
DNA methylation and gene expression
Table 8.7
Features associated with transcriptionally active and inactive
chromatin
| Chromatin conformation | Open, extended conformation | Highly condensed conformation; particularly
apparent in heterochromatin (both facultative and
constitutive; see Section 3.5) |
| DNA methylation | Relatively unmethylated, especially at promoter
regions | Methylated, including at promoter regions |
| Histone acetylation | Acetylated histones | Deacetylated histones |
The DNA of transcriptionally active and inactive chromatin differs in a
number of features including the degree of compaction and the extent of its
methylation (
Table 8.7). While
methylation of CpG islands
downstream of promoters does not block continued
transcription through these regions (
Jones, 1999), there is no doubt that methylated
promoter regions
are correlated with transcriptional silencing. In addition, the extent of
histone acetylation is an important factor. Specific histone
acetyltransferases add acetyl groups to lysine residues close to the N
terminus of histone proteins. The acetylated N termini then form tails that
protrude from the
nucleosome core. As the acetylated histones are thought to
have a reduced affinity for the DNA, and possibly for each other, the
chromatin may be able to adopt a more open structure that is more suited to
gene expression. Deacetylation of the histones, however, promotes repression
of gene expression presumably because the chromatin can become more
condensed.
Figure 8.21
.
Transcriptional repression by histone deacetylation may be
mediated by DNA methylation
CpG dinucleotides are targets for DNA methylation and, in turn,
methylated CpGs are targets for specific binding by proteins
such as MeCP2, which acts as a transcriptional repressor and
recruits a corepressor complex consisting of the transcription
factor repressor mSin3A and histone deacetylases. The latter
removes acetyl groups from histones. The reverse process
involves sequential histone acetylation then DNA demethylation
(see Ng and Bird,
1999).
Recently, the processes of DNA methylation and histone deacetylation have
been shown to be linked. Repression at methlyated CpG sequences in
promoter
regions appears to be mediated by proteins which specifically bind to
methylated CpG. Two of these proteins have been identified, MeCP1 and MeCP2
(
methylated
CpG binding
proteins 1 and 2), and the latter has been shown
to be essential for embryonic development and to function as a
transcriptional repressor. The ability of MeCP2 to repress gene expression
has been shown to involve a histone deacetylase complex (
Ng and Bird, 1999). One possible
model envisages that an initial signal for transcriptional repression is the
binding of MeCP2 to methylated CpG
promoter sequences. The bound MeCP2
protein is then recognized by a complex consisting of a transcriptional
repressor and a histone deacetylase which removes the acetyl groups from the
N termini of the histones, so that the chromatin becomes more condensed
().
8.5. Long-range control of gene expression and imprinting
8.5.1. Chromatin structure may exert long-range control over gene expression
A predominant theme in eukaryotic gene expression, distinguishing it from
bacterial gene expression, has been that genes are individually
transcribed. Promoters and related upstream elements typically control
expression of a single gene with a transcription start point located within 1 kb
of the control element. Some cis-acting elements, however,
exert long-range control over a much larger chromosomal region and there is
increasing evidence for coordinate regulation of gene clusters. Studies where
genes are repositioned elsewhere in the genome have also suggested that
chromosomes are organized into functional domains of gene expression (chromatin
domains). For example, when genes are translocated to new chromosomal regions
(either as a result of spontaneous chromosome breakage events, or by genetically
manipulating model organisms - see Section 21.3), aberrant gene expression may often occur,
even although the entire gene and the required control sequences in its
immediate flanking sequences are preserved intact. Neighboring chromosome
domains are envisaged to be separated by boundary elements (also called insulators) which act as barriers to the effects of distal enhancers
and silencers (Geyer, 1997).
Competition for enhancers or silencers
Sometimes long-range control of gene expression appears to depend on
competition between clustered genes for an enhancer. This appears to be a
feature of globin gene expression as described in Section 8.5.2.
Heterochromatin-induced position effects
Studies of chromosomal rearrangements in Drosophila have
shown that proximity to centromeres, telomeres or heterochromatic blocks may
suppress gene expression, presumably by altering the structure of a large
chromatin domain. Fascioscapulohumeral muscular dystrophy (FSHD; MIM
158900) is a
possible example of a similar position effect in man. The gene for this
autosomal dominant progressive neuromuscular disease maps close to the
telomere of chromosome 4q. When Southern blots of
EcoRI-digested DNA are hybridized to a subtelomeric probe
p13H-11, a very large hybridizing fragment of over 30 kb is seen. DNA from
FSHD patients consistently shows smaller bands of 14–28 kb. These
patients have a reduced copy number of a 3.2 kb repetitive sequence that is
recognized by the probe, and de novo deletion of stretches
of the repeats have been observed by FSHD patients with no previous family
history.
Hopes that the 3.2 kb sequence would contain the FSHD gene,
however, have been disappointed. Even though it contains part of a
homeodomain (Hewitt et
al., 1994), there is no evidence that any part of it is
transcribed or expressed. Most probably the FSHD gene is
located proximally to the tandem 3.2 kb repeats, and the deletions move it
closer to the 4q telomere, where it is silenced by a position effect.
Other position effects
Evidence of long-range effects controlling gene expression over large
chromosomal domains has emerged from studies of disease-associated
chromosome breakpoints in humans (Kleinjan
and van Heyningen, 1998). Examples are aniridia (AN1; MIM
106210), which is
caused by loss-of-function mutations of the PAX6 gene on
11p13, and campomelic dysplasia (CMPD1; MIM 211970), which is caused by mutations in
the SOX9 gene on 17q24. In each case, affected patients are
known who have clearly causative chromosomal breaks, but the breakpoints may
be very distant (hundreds of kilobases) from the gene whose expression is
affected and do not physically disrupt it. It seems likely that expression
of the gene is suppressed by a long-range effect analogous to the classic
position effects described above, reflecting the novel chromosomal
environment created by the translocation.
Prader-Willi and Angelman syndromes (see Section 16.4.2) bring together position effects, imprinting and
DNA methylation. A cis-acting sequence analogous to the
globin locus control region has been identified which governs
parent-specific methylation and gene expression of a megabase-size
chromosomal region at 15q11.
X inactivation
X chromosome inactivation in mammals appears to be initiated by a single
gene, XIST, which is uniquely expressed on the inactivated
X chromosome (see Section 8.5.6).
This effect is not understood but must be mediated by some sort of
long-range chromatin structural change. This is so because a diffusible
XIST-mediated agent would not be able to affect just
the X chromosome on which the XIST gene is expressed.
8.5.2. Expression of individual genes in gene clusters may be coordinated by a
common locus control region
Figure 8.22
.
Human hemoglobin switching occurs at two distinct developmental
stages
Some human gene clusters show evidence of coordinated expression of the
individual genes in the cluster. For example, individual genes in the
α-globin, β-globin and the four
HOX gene
clusters are activated sequentially in a temporal sequence that corresponds
exactly with their linear order on the chromosome. In the case of the globin
genes, there is a clear developmental stage-specific expression: different genes
can be active at the embryonic, fetal or adult stages to generate slightly
different forms of hemoglobin (hemoglobin switching; ).
Figure 8.23
.
Gene expression in the α- and β-globin gene
clusters is controlled by common locus control regions
(A) Organization of the human α- and
β-globin gene clusters.The locus control regions (LCRs)
consist of one or more erythroid-specific DNase I-hypersensitive
sites (HS-40, etc.) located upstream of the cluster. Arrows mark the
direction of transcription of expressed genes. The functional status
of the θ-globin gene is
uncertain: it is expressed, but may be an expressed pseudogene (see
Box 7.3).
(B) Regulation of gene expression by the
β-globin LCR. The strong blue arrows indicate a powerful
enhancer effect by the LCR on the indicated genes, resulting in a
high expression level; dotted blue arrows indicate correspondingly
weak effects.
Recently, it has become apparent that the expression of the genes in each of the
two human globin gene clusters is coordinated by a
dominant control region, the
locus control region (LCR) which is located some distance
upstream of the gene cluster (see
Grosveld
et al., 1993). Such cluster-specific LCRs are
thought to organize the cluster into an active chromatin domain and to act as
enhancers of globin gene
transcription. The open conformation of
transcriptionally active chromatin domains makes them more accessible to
cleavage by the enzyme DNase I. Consistent with this relationship, the
β-globin LCR has been considered to comprise short sequences at three
major erythroid-specific
DNase I-hypersensitive sites (HS2, HS3 and HS4)
clustered over a 15 kb region located about 50–60 kb
upstream of the
β-globin gene, while the α-globin LCR has been identified to
occur at an erythroid-specific DNase-hypersensitive site, HS-40, located 60 kb
upstream of the α-globin gene (). Each site marks the location of what is effectively an
enhancer sequence of about 200–300 bp of DNA
which contains short
cis-acting sequence elements, including
multiple sequence elements recognized by erythroid-specific
transcription
factors (see ). Without the
respective LCRs, globin gene expression is negligible and, in the case of the
β-globin LCR, it appears that the HS2, HS3 and HS4 elements interact
with each other to form a larger complex that interacts with the individual
globin genes.
Other
DNase I-hypersensitive sites are located at the promoters of the globin
genes, but show developmental stage specificity. For example, in fetal liver,
the promoters of the two γ genes, the β and δ genes,
are marked by
DNase I-hypersensitive sites but, in adult bone marrow, the two
γ genes are no longer transcriptionally active and their promoters no
longer reveal
DNase I-hypersensitive sites. Developmental stage-specific
switching in globin gene expression is then thought to be accomplished by
competition between the globin genes for interaction with their respective LCR
and stage-specific activation of gene-specific
silencer elements. For example,
transcription of the ε-globin gene (
HBE1) is
preferentially stimulated by the neighboring LCR at the embryonic stage. In the
fetus, however, ε-globin expression is suppressed following activation
of a
silencer and γ-globin expression becomes
dominant ().
In addition to the human globin LCRs a number of different additional LCRs have
been identified (see Kioussis and Festenstein,
1997). However, the role of the human β-globin LCR (which
has relied on analysis of human transgenes) has been challenged by gene
targeting studies in mice which show that the β-globin LCR has a
contributory function rather than a dominant one, and that it is not required
for initiation of DNase sensitivity and expression of the genes in the mouse
β-globin cluster. These contradictory findings may possibly mean that
there is a functional difference between the human and murine LCRs (see Grosveld, 1999 for a review).
8.5.3. Some human genes show selective expression of only one of the two parental
alleles
X-linked genes in females and all autosomal genes are biallelic because both
father and mother normally contribute one allele each. In males possessing one X
chromosome and one Y chromosome, the great majority of sex-linked genes are
monoallelic: most of the many genes on the X do not have a
functional homolog on the Y chromosome; and some of the few genes on the Y
chromosome are known to be Y-specific, for example SRY, the
major male sex-determining locus. A few genes on the Y chromosome do have
functional homologs on the X chromosome and so are biallelic. In some cases of
X-Y homologous loci, both homologs are normally functional (see Section 14.3.1 and Figure 14.9).
We are accustomed to assuming that both the paternal and maternal alleles of
biallelic genes are expressed, unless one or both copies have sustained
mutations which affect expression. Clearly the expression can be tissue-specific
so that in some cells both parental alleles are strongly expressed; in others,
both gene copies are not apparently expressed. Thus, although there may be cell
type-specific differences in expression, there is no discrimination between the
capacity of the two parental alleles to be expressed, other than that due to
genetic (mutational) differences between them. However, in humans and other
mammals, several biallelic genes are known where the expression of one parental
allele, either the paternal or the maternal allele but not both, is
normally repressed in some cells (allelic exclusion). In such cells the relevant gene is
said to exhibit functional hemizygosity: only one half of the maximum gene
product is normally obtained even although the sequences of both
parental alleles are perfectly consistent with normal gene expression and
may even be identical. In some cases the allelic exclusion may be a
property of select cells or tissues while in other cells of the same individual
both alleles may be expressed normally.
Although initially considered a rarity, monoallelic expression of biallelic genes
has been demonstrated for a growing number of human genes. A variety of
different expression mechanisms can be involved and two broad classes of
mechanism are involved (see Chess,
1998; Ohlsson et al.,
1998):
-
Allelic exclusion according to parent of origin
(imprinting). In some cases, the choice of which of
the two inherited copies is expressed is not random. This means that for
some genes the allele whose expression is repressed is always the
paternally inherited allele; in others it is always the maternally
inherited allele (see Section
8.5.4).
-
Allelic exclusion independent of parent of origin. Here
the decision as to which of the two alleles is repressed is initially
made randomly, but afterwards that pattern of allelic exclusion is
transmitted stably to daughter cells following cell division. A variety
of different mechanisms may be involved (see Box 8.6). In some cases, complex
gene regulation may be required. For example, olfactory receptor genes
are found in large clusters or arrays in mammalian genes. In an
individual olfactory neuron only one allelic array of olfactory receptor
genes is active (see Chess,
1998). A unique form of control is the programed DNA
rearrangements which are required for individual cell-specific
expression of the immunoglobulin genes in B cells and the T cell
receptor genes in T lymphocytes. Because of the complexity of the latter
mechanisms they are discussed separately in Section 8.6.
8.5.4. Genomic imprinting involves differences in the expression of alleles
according to parent of origin
Various observations in mammals have suggested that the maternal and paternal
genomes in an individual are not equivalent (
Box 8.7). In addition to genetic differences between the
DNA of the sperm and oocyte genomes, there are also
epigenetic differences. A
major difference is in both the total amount of DNA methylation (the sperm
genome is more extensively methylated than the oocyte
genome) and the pattern of
DNA methylation in specific DNA sequence classes. For example, Line 1 sequences
are highly methylated in sperm cells but only partially methylated in the oocyte
(
Razin and Kafri, 1994;
Yoder et al., 1997). At
some individual gene loci, too, there are major differences between the extent
of methylation of paternal and maternal alleles. For example, the paternal
allele of the
H19 gene is heavily methylated; the maternal
allele is undermethylated.
As suggested by the observations in
Box 8.7, differences between the paternal and
maternal genomes lead to differences in expression between paternal and maternal
alleles. Genomic
imprinting (also called gametic or parental
imprinting) in
mammals describes the situation where there is nonequivalence in expression of
alleles at certain gene loci, dependent on the parent of origin (
Reik and Walter, 1998;
Brannan and Bartolomei, 1999;
Tilghman, 1999). In all (or at least
some) of the tissues where the gene is expressed, the expression of either the
paternally inherited
allele or the maternally inherited
allele, is consistently
repressed, resulting in
monoallelic expression. The same pattern of monoallelic
expression can be faithfully transmitted to daughter cells following cell
division. However, as the nucleotide sequence of the
allele whose expression is
repressed may be perfectly consistent with gene expression (and may even be
identical to that of the expressed
allele), this is an
epigenetic phenomenon,
not a genetic one.
Prevalence and evolution of imprinting
Most human genes are not subject to imprinting, otherwise we would not see so
many simple mendelian characters. Systematic surveys have been made to
identify imprinted chromosomal regions in the mouse. Unlike in humans, all
mouse chromosomes are acrocentric and Robertsonian translocations can permit
crosses to be set up which produce offspring having both copies of one
particular chromosome derived from a single parent (uniparental disomy,
UPD, see Section
2.6.4). These reveal that UPD for some chromosomes has no
phenotypic effect; for others it produces abnormal phenotypes. The abnormal
phenotypes are sometimes complementary for different parental origins, e.g.
overgrowth is often seen in maternal UPD and growth retardation in paternal
UPD. For some chromosomes, UPD is lethal.
Figure 8.24
.
Imprinted gene clusters in 11p15.5 and 15q11-q13
Genes not known to be imprinted are shown in black; imprinted
genes are in blue. Genes shown as solid blue boxes, e.g.
KCNQ1, UBE3A, show
preferential repression of paternal alleles
(so that in some tissues only the maternal allele is expressed).
Genes shown in open blue boxes, e.g. IGF2,
ZNF127, have the opposite pattern:
preferential repression of maternal
alleles. Arrows indicate direction of transcription. IC
denotes imprinting center (see text). The
15q11-q13 region has been less well studied and other genes are
likely to be found there. Note that some of the genes have other
names in the literature, e.g. KCNQ1
(KvLQT1), CDKN1C
(p57 or KIP2),
CD81 (TAPA1). Data from
Lee et
al. (1999) and Schweizer et al.
(1999).
Further dissection at the chromosomal and genetic levels shows that
imprinting is a property of a limited number of individual genes or small
chromosomal regions. Currently, a total of over 30 genes are known to be
imprinted in humans and mice (
electronic
reference 1), but the list can be expected to grow. Thus far, two
major clusters of imprinted genes are known in the human
genome: a 1 Mb
region at 11p15 (encompassing the Beckwith- Wiedemann region) contains at
least seven imprinted genes which may be arranged in two clusters (
Lee et al., 1999); a
2.3 Mb cluster at 15q11-q13 region (encompassing the Prader-Willi and
Angelman syndrome loci) also contains at least seven imprinted genes (
Schweizer et al.,
1999; see ).
The imprinted gene clusters contain examples of neighboring genes with
different parental imprints e.g. the
H19 gene is expressed
only from the maternal chromosome 11 whilst the adjacent
IGF2 gene is expressed only from the paternal
chromosome.
The great majority of known imprinted genes are autosomal. However, the
XIST gene which has a major role in establishing X
chromosome inactivation (see next section) may be considered an example of
an imprinted X-linked gene since expression of the maternally inherited
allele is preferentially repressed in trophoblast. An imprinted X-linked
gene which affects cognitive function has also been suggested from
differential behavior patterns in Turner syndrome. Girls with Turner
syndrome lack a Y chromosome but have only one X chromosome. If the X
chromosome is inherited from the mother, socially disruptive behavior is
common, but if inherited from the father, the girl shows behavior closer to
normal for that of a girl (Skuse
et al., 1997).
Imprinting is known to occur in seed plants, some insects and mammals. No
major
imprinting effect, as judged by
phenotype, has been observed in some
model organisms such as
Drosophila, C. elegans and the
zebrafish, although the potential for
imprinting may exist in
Drosophila. Mammals are unusual in the way in which
embryos are totally dependent on flow of nutrients from the maternal
placenta. As many imprinted genes are involved in regulating fetal growth,
one explanation envisages conflict between the parental genomes: the
paternal
genome propagates itself best by creating an embryo which
aggressively removes nutrients from the mother; the maternal
genome
suppresses this to protect the mother and spare some resources for future
offspring. As seen in cases of uniparental diploidy (see
Box 8.7), paternal genes are
preferentially expressed in the trophoblast and extraembryonic membranes,
while maternal genes are preferentially expressed in the embryo.
8.5.5. The mechanism of genomic imprinting is unclear but a key component appears to
be DNA methylation
Table 8.8
Examples of tissue and developmental stage regulation of
imprinted genes in mammals
| IGF2 (insulin-like growth
factor type 2) | Maternal | Imprinted in many tissues but biallelic
expression in brain, adult liver and chondrocytes etc. |
| PEG1/MEST | Maternal | Imprinted in fetal tissue but biallelically
expressed in adult blood |
| UBE3A (ubiquitin protein
ligase 3) | Paternal | Imprinted exclusively in brain; biallelically
expressed in other tissues |
| KCNQ1 (potassium channel
involved) | Paternal | Imprinted in several tissues but biallelically
expressed in heart |
| WT1 (Wilms' tumor gene) | Paternal | Frequently imprinted in cells of placenta and
brain but biallelic expression in kidney |
To confirm
imprinting of a gene, it is necessary to identify an individual who is
heterozygous for a sequence variant present in the mature mRNA. mRNA from
different tissues can then be checked for monoallelic or biallelic expression,
and the origin of each
allele determined by typing the parents. For some genes,
this type of analysis has shown that
imprinting is confined to only certain
tissues or to certain stages of development (
Table 8.8). Thus
imprinting allows an extra level of control of gene
expression, but it is not possible to compress its functioning into a simple
uniform story.
The above observations suggest that some mechanism must be able to distinguish
between maternally and paternally inherited alleles: as chromosomes pass through
the male and female germlines they must acquire some imprint to signal a
difference between paternal and maternal alleles in the developing organism. A
key component, at least in maintaining the imprinted status, is allele-specific
DNA methylation (Brannan and Bartolomei,
1999; Tilghman, 1999). The
imprinting of several imprinted genes has been shown to be disrupted in mutant
mice that are deficient in the Dnmt1 cytosine methyltransferase
gene and all imprinted genes are characterized by CG-rich regions of
differential methylation.
Figure 8.25
.
Genomic (gametic) imprinting requires erasure of the imprint in
the germline
The diagram illustrates the fate of a chromosome carrying two genes,
A and B, which are subject to imprinting: A is imprinted in the
female germline, B is imprinted in the male germline, as indicated
by asterisks. As a result, in diploid somatic cells A is imprinted
when present on a maternally inherited chromosome and B is imprinted
when present on a paternally inherited chromosome. An individual
chromosome may pass through the male and female germlines in
successive generations: a man may transmit a chromosome inherited
from his mother and a woman can transmit a chromosome inherited from
her father, as indicated by the gametes in the left panel. As a
result, there must be a mechanism whereby the old imprint is erased
from the germline prior to establishing a new sex-specific
imprint.
Intriguingly,
Dnmt1 is known to have sex-specific exons (
Section 8.4.2 and ). In oocytes this results in an
oocyte-specific amino-terminal truncated protein product which conceivably could
specifically methylate the maternal alleles of genes such as the insulin-like
growth factor II receptor. The spermatocyte-specific
exon of
Dnmt1 interferes with translation of
Dnmt1
mRNA, and it is less clear how paternal-specific patterns of methylation could
be acquired. During development, the imprint would be expected to be stably
inherited at least for many rounds of DNA duplication (but see below). Clearly,
there must also be a mechanism for erasing the imprints during transmission
through the germline, as required when, for example, a man passes on an
allele
which he has inherited from his mother (). Again, however, one can envisage the demethylation that occurs
during the early embryo as one way of achieving this, leaving the primordial
germ cells essentially unmethylated (see ).
The timing of imprinting has become more clear recently. In the female germline,
the maternal imprint, including the maternal pattern of methylation, is likely
to be established during oocyte maturation which is consistent with the finding
that the Dmnt1 protein is not detectable in nongrowing oocytes but is produced
abundantly in growing oocytes. In the male germline, the functional paternal
imprint is likely to be established prior to meiosis, possibly in the
postmitotic primary spermatocyte (Brannan and
Bartolomei, 1999).
Imprinted genes frequently reside in clusters with genes expressed on opposite
chromosomes often located next to each other, and often containing genes which
appear to encode a mature RNA (see ). Adjacent genes appear to be jointly regulated. In the case of
the Prader-Willi/Angelman syndrome cluster on 15q11-q13, for example, a single
region adjacent to the
SNRPN gene, termed the
imprinting
center, is the
dominant regulatory sequence and appears to act over
comparatively large distances (see
,
Brannan and Bartolomei, 1999;
Tilghman, 1999). However, different
mechanisms may be found in different imprinted clusters, or even within a single
cluster. For example, the mouse
H19, Igf2 and
Ins2 genes are jointly regulated, sharing two
endodermal-specific enhancers that are located 3′ to
H19. But other genes in the cluster are not subject to this
control, suggesting that multiple control mechanisms may occur within an
imprinted gene cluster. Different
imprinting mechanisms have been considered in
relation to the
H19/Igf2 regulation, including
enhancer
competition, but to explain a variety of contradictory findings, an
imprinting
center adjacent to
H19 has been considered to function as a
chromatin boundary (
insulator) element (
Tilghman, 1999).
8.5.6. X chromosome inactivation in mammals involves very long-range cis-acting
repression of gene expression
Nature of X chromosome inactivation
X chromosome inactivation is a process that occurs in all mammals, resulting
in selective inactivation of alleles on one of the two X chromosomes in
females (Migeon, 1994; Lyon, 1999). It provides a mechanism
of dosage compensation which overcomes sex differences in the expected ratio
of autosomal gene dosage to X chromosome gene dosage (which is 2:1 in males
but 1:1 in females). Males with a single X chromosome are constitutionally
hemizygous for X chromosome genes, but females become functionally
hemizygous by inactivating one of the parental X chromosome alleles (see
also Section 2.2.3). Not all genes on
the X chromosome are subject to inactivation; genes which escape
X-inactivation include ones where there is a functional homolog on the Y
chromosome, and some genes where gene dosage does not seem to be important
(see Figure 14.12 for examples of
genes which escape X-inactivation).
In rare individuals with an abnormal number of X chromosomes (45,X; 47,XXX;
47, XXY, etc), a single X chromosome remains active no matter how many are
present. By contrast, in triploid individuals either one or two X
chromosomes remain active and in tetraploids two X chromosomes remain
active. Thus, there must be some kind of counting mechanism
to ensure that one X chromosome remains active for every two sets of
autosomes.
In mammals, both X chromosomes are active in the early female embryo.
X-inactivation occurs at an early stage in development, being initiated at
the late blastula stage in mice, and most likely also in humans. In each
cell that will give rise to the female fetus, one of the two parental X
chromosomes is randomly inactivated (note
that trophoblast cells are an exception; the paternal X chromosome is
preferentially inactivated, which is a classical example of
tissue-restricted imprinting). After the paternal or maternal X chromosome
is inactivated in a cell, the same X chromosome usually remains inactive in
all progeny cells, that is the X chromosome inactivation pattern is clonally
inherited (see Figure 2.6). This
means that female mammals are mosaics, comprising mixtures of cell lines in
which the paternal X is inactivated and cell lines where the maternally
inherited X is inactivated. In addition to X chromosome inactivation in
female somatic cells, the X chromosome is known to be inactivated
transiently during gametogenesis in both males and females.
Mechanism of X chromosome inactivation
The process of X chromosome inactivation is complex, and distinct molecular
mechanisms are involved in initiation of inactivation and maintenance of the
inactivation. The X-inactivation center (Xic), which in humans is located at
Xq13, controls the initiation and propagation of X-inactivation. At this
centre, the XIST gene (called Xist in
rodents) encodes a mature 15 kb RNA product which is uniquely encoded by the
inactive X chromosome. XIST/Xist is therefore another
example of a gene that is subject to monoallelic expression. In the cells of
the early embryo, the decision regarding which X chromosome to inactivate is
made randomly, and so the allelic exclusion which XIST/Xist
shows in these cells is independent of parent of origin.
XIST/Xist is essential for Xic function in initating X
chromosome inactivation but is not required for maintaining X chromosome
inactivation. Somehow cis-limited spreading of this RNA
product acts so as to coat the inactivated X chromosome over very long
distances. In rodents, coating of the Xist RNA gives the
inactivated X chromosome a banded pattern suggesting a preferential
association with gene-rich Giemsa-minus regions. However, the mechanism of
ensuring inactivation of genes on the inactive X but not on the active X is
unknown (see Duthie et
al., 1999 for possible models).
Although the Xist gene is essential for Xic function,
Xist alone is not sufficient. The X-controlling element
(Xce) affects the choice of which X chromosome remains active, and is
distinct from Xist, being located 3′ to it. In
addition, deletion of a 65-kb region 3′ to Xist
produces an effect which suggests that elements involved in the counting
mechanism lie distal and 3′ to Xist. Recently,
another gene has been identified as being transcribed from the opposite
strand to that which is used for transcribing the Xist
gene. Because the transcription unit of the new gene completely overlaps the
Xist gene, and is in the reverse orientation, it has
been named Tsix (Lee
et al., 1999). This has given rise to the
idea that Xist may be regulated by the
Tsix gene (see Heard
et al., 1999 for possible models of
Tsix regulation).
8.6. The unique organization and expression of Ig and TCR genes
The organization and expression of immunoglobulin (Ig) and T-cell recepter (TCR)
genes is in many ways quite different from that of other genes. This is so because
of the need for each individual to produce a huge variety of different Igs and TCRs.
An individual B or T lymphocyte is monospecific and produces a
single type of Ig or TCR; it is the population of different B and T cells in any one
individual that enables the synthesis of so many different types of these molecules.
B and T lymphocytes need to be extremely diverse because they represent the cells
that provide antibody responses or cell-mediated responses to foreign antigen: by
providing a large repertoire of Igs and TCRs, the possibilities for being able to
recognize and bind very many different types of foreign antigen are greatly
increased.
8.6.1. Ig and TCR genes exhibit a unique organization: multiple gene segments can
encode each of several different regions of the polypeptide
Polypeptide structure
Table 8.9
Ig classes and subclasses
| IgA (IgA1,IgA2) | α (α1,
α2) | Predominant Ig in seromucous secretions, e.g.
saliva, milk, etc. |
| IgD | δ | Low in serum but present in large quantities on
surface of many circulating B cells |
| IgE | ε | Especially on surface membrane of basophils and
mast cells |
| IgG (IgG1, IgG2, IgG3, IgG4) | γ (γ1,
γ2, γ3,
γ4) | Major serum Ig |
| IgM | μ | Predominant ‘early’
antibody |
An Ig molecule consists of four polypeptide chains, two identical heavy
chains and two identical light chains (see
Figure 7.10). The light chains fall into two classes: kappa
(κ) and lambda (λ) light chains, which are functionally
equivalent. At the N-terminal segments of each type of chain are the
so-called variable (V) regions, which need to bind foreign antigen; the
remaining C-terminal segments are constant (C) regions. In the case of the
heavy chains, there are different alternatives for the constant region which
specify the tissues in which the Ig will be expressed and dictate the
immunoglobulin class (
Table 8.9).
Similarly, TCRs, which provide cell-mediated immune responses to foreign
antigens, consist of two types of chain. Each such chain has Ig-like
variable regions which bind foreign antigen, and constant regions which
anchor the molecule to the cell surface (see
Figure 7.10). The most frequently occurring TCRs have a
β and a γ chain; a minor population consists of an
α chain and a δ chain.
Gene structure
Table 8.10
Functional human Ig and TCR loci
| Locus | Location | Number of
gene segments
|
|---|
| IGH | 14q32.3 | 86 | 30 | 9 | 11 |
| IGK | 2p12 | 76 | 0 | 5 | 1 |
| IGL | 22q11 | 52 | 0 | 7 | 7 |
| TCRA | 14q11-12 | 60 | 0 | 75 | 1 |
| TCRB | 7q32-33 | 70–100 | 2 | 13 | 2 |
| TCRG | 7p15 | 8 | 0 | 5 | 2 |
| TCRD | 14q11-12 | 6 | 3 | 3 | 1 |
Figure 8.26
.
The Ig heavy chain locus on 14q32 contains about 86 variable
(V) region sequences, 30 diversity (D) segments, nine joining
(J) segments and 11 constant region (C) sequences
The entire locus spans about 1200 kb of 14q32.3 and, for
clarification, is shown as three segments of 400 kb from the
telomeric end (top) to the centromeric end (bottom). Although
the DH segments are mostly located in a few clusters separating
the VH and JH segments, at least one such
segment is located in the JH segment region. Segments
indicated by open circles have the required open reading frames
but have not been observed in productive rearrangements and so
their functional status is unknown. Segments which are known to
be nonfunctional are indicated in blue, and account for
approximately one third of all the segments.
Note that although this is the only
functional human heavy chain locus, small clusters of
VH and D segments are also located on 15q11.2 and
16p11.2. Adapted from data in Cook et al. (1994)
Nature Genet., 7, pp.
162–168, with permission from Nature America Inc.
The genes which encode the different types of chain in Igs and TCRs are
located on different chromosomes and are organized as clusters of numerous
gene segments (
Table
8.10). Each such cluster is unusual in that the coding
sequences for
specific segments of each chain are often
present in numerous different copies that are sequentially repeated. For
example, although the
constant region of human
κ light chain Ig is encoded by a single C
κ
sequence, the variable regions are encoded by a combination of a
V
κ segment (which encodes most of the
variable region) and a short
J
κ segment (
joining segment;
encodes a small part at the C-terminal end of the variable region) which are
selected from a total of about 76 alternative V
κ
segments and five alternative J
κ segments. Although the
λ light chain is similarly encoded by V
λ,
J
γ and C
γ segments, the heavy
chain Ig
locus shows some differences. The variable region is encoded by a
combination of a V
H gene segment, a J
H gene segment
and also a D
H gene segment (encoding a
diversity segment), each selected from many repeated
gene segments. Additionally, there are a variety of different C
H
sequences which specify the
class of the Ig (see above). In
total this cluster comprises about 140 gene segments, of which about
one-third are known to be incapable of expression, and spans about 1200 kb
().
As each Ig gene cluster or TCR gene cluster in an individual B or T
lymphocyte only ever gives rise to at most one Ig or TCR polypeptide, an
entire cluster can functionally be regarded as a single, albeit unusual,
type of gene. However, the individual gene segments cannot be regarded as
the functional equivalent of classical exons. This is so
because individual gene segments in these clusters are sometimes composed of
coding DNA and noncoding DNA and may consist of several exons. For example,
each of the human CH sequences is itself composed of three or
four classical exons separated by introns: after transcription into RNA, the
intronic sequences are discarded, and only the exonic sequences are retained
in the mRNA.
8.6.2. Programmed DNA rearrangements at the Ig and TCR loci occur during the
maturation of B and T lymphocytes, respectively
The unique arrangement of gene segments in the Ig and TCR gene clusters reflects
the very unusual way in which somatic recombinations are required in B and T
lymphocytes before functional Ig and TCR genes can be assembled and then
expressed (see below). Such somatic recombinations result in bringing together
different combinations of the different gene segments in different individual
lymphocytes. Consequently, they can be regarded as both
tissue-specific (confined to B and T lymphocytes) and
cell-specific events which involve alternative DNA
splicing (as opposed to alternative RNA splicing which brings about
different combinations of exons at the RNA level -
Section 8.3.2). As a result, the original
germline gene organization is altered: gene segments that were distant in the
germline are spliced together at the DNA level. Because the choice of which of
the many repeated gene segments are recombined to give a functional V-J or V-D-J
unit is cell specific, individual B and T cells produce
different Igs and TCRs. This means that, in a sense, every individual is a
mosaic with respect to the organization of the Ig and TCR genes in B and T
lymphocytes; even identical twins will diverge genetically.
The rearrangements which lead to the production of functional light chains and
heavy chains of Igs are slightly different.
-
Figure 8.27
.
Igs are synthesized following somatic recombination of V and
J, or V, D and J segments and subsequent RNA splicing to C
sequences
(
A) Light chain synthesis. Somatic recombination
(
DNA splicing) results in joining of a
specific variable (V) segment to a specific joining (J) segment;
the example shows a V
3-J
2 joining which is
only one of many possibilities. The VJ unit is then spliced to
the constant region (C) sequence by
RNA
splicing. (
B) Heavy chain synthesis.
Two sequential somatic recombinations produce first D-J joining,
then a VDJ unit. Subsequent RNA
splicing results in
splicing of
the VDJ sequence to the C
μ sequence. As the B
cell matures, however, subsequent somatic recombinations result
in joining of the previously selected VDJ unit to different C
genes (
heavy chain switch, see text and ).
Making a light chain. In order to generate a functional
κ light chain Ig, for example, a somatic recombination event
brings together a specific combination of one of the
Vκ gene segments and one of the
Jκ gene segments (V-J joining). Thereafter,
splicing to the single Cκ sequence occurs
at the RNA level (). -
Making a heavy chain. Two successive somatic
recombinations are required, resulting first in
DH-JH joining, and then
VH-DH-JH joining. Subsequently the
resulting VH-DH-JH coding sequence is
spliced at the RNA level to the nearest CH
sequence, initially Cμ ().
Because there are three types of functional Ig gene loci in human cells (heavy
chain, κ light chain and λ light chain), and because these
occur on both maternal and paternal homologs, there are six chromosomal segments
in which DNA rearrangments can result in production of an Ig chain. However, an
individual B cell is monospecific: it produces only one type of
Ig molecule with a single type of heavy chain and a single type of light chain.
This is so for two reasons:
-
Allelic exclusion. A light chain or a heavy chain can be
synthesized from a maternal chromosome or a paternal chromosome in any
one B cell, but not from both parental homologs. As a result, there is
monoallelic expression at the heavy chain gene locus in B cells. This
phenomenon also applies to TCR gene clusters.
-
Light chain exclusion. A light chain synthesized in a
single B cell may be a κ chain, or a λ chain, but
never both. As a result of this requirement, plus that of allelic
exclusion, there is monoallelic expression at one of the two functional
light chain gene clusters and no expression at the
other. The decision to choose which of the two heavy chain alleles and
which of the four possible segments to make a light chain appears to be
random. Most likely, in each B-cell precursor, productive DNA
rearrangements are attempted at all six Ig alleles but the chances of
productive arrangements in more than one light chain cluster or more
than one heavy chain allele may not be high. Additionally, however,
there appears to be some kind of negative feedback regulation: a
functional rearrangement at one of the heavy chain alleles suppresses
rearrangements occurring in the other allele, and a functional
rearrangement at any one of the four regions capable of encoding a light
chain suppresses rearrangements occurring in the other three.
8.6.3. V-J and V-D-J joining is often achieved by intrachromatid deletions, and also
by megabase inversions in the former case
Figure 8.28
.
Inversion or deletion results in V-J splicing to produce
functional Ig κ light chain genes
The human κ light chain gene cluster contains about 76
V
κ segments arranged in two large
clusters, in opposite orientations. V segments in the distal
cluster have the opposite orientation to the
J
κ segments and the single
C
κ sequence. As a result, the DNA
rearrangements used to splice distal V
κ
segments to a J
κ segment are megabase
inversions (
Weichhold
et al., 1990). Those in the
proximal cluster can undergo V-J joining by a somatic
recombination resulting in a deletion of the intervening
chromosomal segment, most likely through an intrachromatid
recombination event such as that used in class switching (see
).
The genetic mechanism leading to V-J and V-D-J joining often involves large-scale
deletions which are thought to occur by an intrachromatid recombination event,
similar to those used in V-D-J-C joining (see next section). In addition, V-J
joining often occurs as a result of megabase inversions. The human κ
light chain gene
locus spans about 1840 kb on 2p12 and includes about 76
V
κ segments, mostly comprising pairs of duplicated V
gene segments, organized as two clusters: a
proximal cluster
located adjacent to the J
κ segments and to the
C
κ segment, and a
distal cluster. This
occurs as a result of an inverted repeat structure: V gene segments in the
proximal V
κ cluster usually have a corresponding duplicate
in a distal V
κ cluster which is separated from the proximal
cluster by about 800 kb and in the opposite orientation (). Depending on which V segment cluster is
involved, V-J joining occurs by two possible routes:
Note that the joining process is imprecise, and so can also introduce a measure
of variability in the sequence at the junctions of joined segments.
8.6.4. Class switching of heavy chains involves differential joining of a single VDJ
unit to alternative DNA segments encoding constant regions
Although a B cell produces only one type of Ig molecule, the heavy chain class
(see
Table 8.9) can change during the
cell lineage (class switching or isotype switching). Such switching involves
differential joining of the same VDJ unit that was brought together by two
successive somatic recombinations (see ) to different segments encoding alternative constant regions.
The initial joining of a VDJ sequence to constant region segments is
accomplished
at the RNA level. However, subsequently, class
switching involves joining the same VDJ unit
at the DNA level
to alternative constant regions by yet more somatic recombination events
(
V-D-J-C joining). Class switching involves the following
progression:
-
initial synthesis of IgM only by immature B cells. This
occurs because the VDJ unit is spliced at the RNA level to a
Cμ sequence (). -
Figure 8.29
.
Ig heavy chain class switching is mediated by intrachromatid
recombination
Note that joining of the same VDJ unit to a
Cμ or a Cδ sequence
occurs at the level of RNA splicing to generate heavy chains for
IgM and IgD respectively. In contrast, class switching to
generate IgA, IgE or IgG involves joining of the same VDJ unit
at the DNA level to, respectively, a Cα,
Cε or, as illustrated in the figure, a
Cγ sequence.
Later synthesis of both IgM and IgD by immature B cells.
The partial switch to making IgD occurs because the VDJ unit can be
spliced at the RNA level to a Cδ sequence, as a
result of alternative RNA splicing (). -
Synthesis of IgG, IgE or IgA by mature B cells. Class
switching events involve splicing the same VDJ unit to a
Cγ, Cε or
Cα sequence, respectively, at the DNA level as
a result of a somatic recombination event (VDJ-C joining). The mechanism
involves deletion of the intervening sequence by intrachromatid
recombination ().