![]() | ![]() |
Formats:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright : © 2008 Hauenschild et al. This is
an open-access article distributed under the terms of the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction
in any medium, provided the original author and source are credited. Evolutionary Plasticity of Polycomb/Trithorax Response Elements in
Drosophila Species 1 Universität Bielefeld, Center for Biotechnology (CeBiTec), Bielefeld, Germany 2 Institute of Molecular Biotechnology (IMBA), Vienna, Austria 3 Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), Heidelberg, Germany 4 Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland 5 Gregor Mendel Institute of Molecular Plant Biology (GMI), Vienna, Austria Michael B Eisen, Academic Editor University of California Berkeley, United States of America #Contributed equally. * To whom correspondence should be addressed. E-mail:marc.rehmsmeier/at/gmi.oeaw.ac.at (MR); Email: ringrose/at/imp.univie.ac.at (LR) Received June 6, 2008; Accepted September 15, 2008. Abstract cis-Regulatory DNA elements contain multiple binding sites for
activators and repressors of transcription. Among these elements are enhancers,
which establish gene expression states, and Polycomb/Trithorax response elements
(PREs), which take over from enhancers and maintain transcription states of
several hundred developmentally important genes. PREs are essential to the
correct identities of both stem cells and differentiated cells. Evolutionary
differences in cis-regulatory elements are a rich source of
phenotypic diversity, and functional binding sites within regulatory elements
turn over rapidly in evolution. However, more radical evolutionary changes that
go beyond motif turnover have been difficult to assess. We used a combination of
genome-wide bioinformatic prediction and experimental validation at specific
loci, to evaluate PRE evolution across four Drosophila species.
Our results show that PRE evolution is extraordinarily dynamic. First, we show
that the numbers of PREs differ dramatically between species. Second, we
demonstrate that functional binding sites within PREs at conserved positions
turn over rapidly in evolution, as has been observed for enhancer elements.
Finally, although it is theoretically possible that new elements can arise out
of nonfunctional sequence, evidence that they do so is lacking. We show here
that functional PREs are found at nonorthologous sites in conserved gene loci.
By demonstrating that PRE evolution is not limited to the adaptation of
preexisting elements, these findings document a novel dimension of
cis-regulatory evolution. Author Summary The evolution of regulatory DNA plays a crucial role in making species
different from one another. One way to study the evolution of regulatory DNA
is by genome alignment, which assumes that elements with conserved function
will be found in conserved pieces of DNA. Although conservation does imply
function, it does not follow that all functional elements must be conserved,
nor that nonconserved DNA has no function. However, computational approaches
based on genome alignment alone cannot identify any kind of evolution beyond
small changes in otherwise conserved elements. We have used a novel
computational approach, in combination with experimental validation, to
examine how regulatory DNA evolves in four Drosophila
species. We focus on Polycomb/Trithorax response elements (PREs), which
regulate several hundred developmental genes, and are vital for maintaining
cell identities. We find that PRE evolution is extraordinarily dynamic: not
only motif composition, but also the total number of PREs, and even their
genomic positions, have changed dramatically in evolution. By demonstrating
that the evolution of PREs goes far beyond the gradual adaptation of
preexisting elements, this study documents a novel dimension of regulatory
evolution. We propose that PRE evolution provides a rich source of potential
diversity between species. Introduction cis-Regulatory DNA elements are essential for the correct
activation, repression, and maintenance of gene expression. These elements typically
contain multiple short DNA motifs, which are recognised by sequence-specific DNA
binding proteins, that either themselves act as activators and repressors of
transcription, or recruit other proteins that do so [1,2]. One class of cis-regulatory DNA elements is
enhancers, which establish gene expression states. Another important class is
Polycomb/Trithorax response elements (PREs), first identified in the
Drosophila homeotic (hox) gene complexes [3,4], where they maintain the
transcriptional states of hox genes that have been determined earlier on in
development by embryonic enhancers [5–7]. The hox PREs preserve the
transcription patterns of their associated genes stably over many cell generations,
long after the proteins that bind the enhancers have disappeared. Thus, hox PREs are
epigenetic memory elements [8]. Although PREs are similar to enhancers in many ways, the most
important functional difference between these two types of elements is that
enhancers respond to local differences in concentration of the transcription factors
that bind them, whereas the Polycomb group (PcG) and Trithorax group (TrxG) proteins
are ubiquitously expressed; thus, the PRE element responds to the transcriptional
state of the promoter [3,4].
Since their initial discovery in the hox complexes, it has become clear that PREs
regulate several hundred other genes in addition. In both flies and vertebrates, the
targets of Polycomb regulation include genes involved in major cell-fate decisions,
and in several differentiation and morphogenetic pathways [9–15]. Consistent with the nature of these
target genes, the PcG proteins are essential to the correct identities of both stem
cells and differentiated cells [16,17]. In D. melanogaster, many
PRE elements that have similar functional properties in transgenic assays are
enriched in preferred pairs of motifs, enabling the identification of a subset of
Drosophila PREs by computational prediction [18,19]. However, these same elements show
no preferred order or number of motif pairs, suggesting that the design of PREs in
terms of linear arrangement of motifs is flexible [18]. Furthermore, fly PREs can act many
tens of kilobases upstream, downstream, or in the introns of the genes they regulate
[9,10], suggesting that
their position relative to their cognate promoter is also flexible. This diversity
of design among D.
melanogaster PREs raises the question of whether these differences
are important for function, and whether PRE position at each gene is conserved
across different Drosophila species. The
bithoraxoid (bxd) PRE, which regulates the hox
gene Ultrabithorax (Ubx; FBgn0003944), shows large
blocks of conserved sequence across several Drosophila species,
supporting the idea that PRE position is evolutionarily constrained [7,20]. However, the conservation of the
several hundred other PREs in the D.
melanogaster genome has not been evaluated, and it is not known
whether these PREs are also evolutionarily constrained. The effects of evolutionary changes in enhancers and promoters have been well studied
for several individual genes in diverse organisms [21,22]. Starting from a known cis-regulatory
element in one species, the orthologous sequences in other species have been
analysed in terms of evolutionary changes and their impact on regulatory function.
These studies have demonstrated that many cis-regulatory elements
show rapid motif turnover [23,24]. In
cases in which function has been evaluated, these studies have shown that some
enhancers tolerate evolutionary change without large differences in function
[25–27]. On the other hand, there are also many examples of
evolutionary differences in enhancer sequences that lead to major phenotypic changes
[21,22,28–30]. Thus,
cis-regulatory elements are a potential source of phenotypic
diversity, and it has been proposed that positive selection acts primarily on
cis-regulatory sequences rather than protein-coding sequences
[31–33]. The genomic sequencing of several closely related species has
enabled the study of cis-regulatory evolution on a genome-wide
scale [34–36]. To overcome the inherent difficulties in identifying
cis-regulatory elements in genomic sequence, much effort has
been invested in comparative genomic approaches, based on the idea that in closely
related species, functional elements will be more conserved than nonfunctional DNA
[36–39]. Thus, to date, both gene-specific and genome-wide evaluation
of cis-regulatory evolution have been limited to the examination of
local changes within elements that are otherwise conserved between species. These
studies have given rise to the view that cis-regulatory evolution
operates on existing elements, in which small changes create novel functions
[22]. However, there is also evidence that argues against local motif turnover as the only
source of cis-regulatory evolution. First, although conservation
certainly does imply function [1,35,36], it
does not necessarily follow that all functional elements must be conserved, nor that
nonconserved DNA has no function [2,37,40].
Indeed, it has been shown theoretically that new elements may arise at a certain
frequency from nonfunctional sequences [41,42], generating functional elements that reside at
nonorthologous positions in the genomes of related species. Consistent with this
prediction, a recent genome-wide chromatin immunoprecipitation on chip (ChIP-chip)
study in D. melanogaster
embryos has demonstrated that many transcription factor binding sites are not
evolutionarily conserved, suggesting that comparative genomics has limited ability
to identify true functional cis-regulatory elements [2]. By definition, computational approaches based on genome alignment alone cannot
identify cis-regulatory elements whose sequence and genomic
position is not conserved. Thus, with this approach, it has not been possible to
evaluate any aspect of cis-regulatory evolution beyond local motif
turnover. An alternative means to ascertain whether more radical types of
cis-regulatory evolution do indeed occur would be to begin by
analysing single genomes using computational prediction tools, and subsequently, to
compare results across several genomes. Since all computational predictions are
prone to false-positive and false-negative results, an essential final step would be
to validate predictions experimentally. In this paper, we use a combination of alignment-independent prediction of
cis-regulatory elements [18,19], comparative genomics, and experimental validation to
examine cis-regulatory evolution beyond motif turnover for PREs in
four Drosophila species. This analysis shows that PRE evolution is
extraordinarily dynamic. We show both computationally and experimentally that the
numbers of PRE elements, their motif composition, and their genomic position change
rapidly in evolution. We identify at least two classes of PREs: those whose
positions are constrained in evolution (such as the hox PREs), and those that do not
have constrained positions. Remarkably, despite the general conservation of the hox
PREs, we identify an extra functional PRE in the Bithorax complex of D. pseudoobscura. By demonstrating
that PRE evolution is not limited to the adaptation of preexisting elements, these
findings document a novel dimension of cis-regulatory evolution.
The implications of these findings for evolutionary diversity are discussed. Results The DNA Sequence Criteria for PRE Function Are Essentially Identical in Four
Drosophila Species We have previously developed an algorithm that predicts PREs in the genome of
D. melanogaster
by scoring for favoured pairs of binding sites for proteins that act on them
[18,19]. In
[18],
43 predicted PREs were selected for experimental analysis; 29 of these were
enriched for PcG proteins in ChIP experiments in S2 cells. A further 12 of those
14 sites that were not enriched in [18] were found to be strongly
enriched for PcG proteins in other cell types, or were confirmed in transgenic
assays [10,11,18]. Thus, over 95% of
these 43 predictions were functional in one cell type or another, confirming the
predictive power of the algorithm for correctly identifying PRE elements. Comparison of the full set of 167 predictions [18] with genome-wide binding
profiles of PcG proteins performed in different cell types or in embryos
[9–11] revealed a partial overlap. Using the most statistically
stringent score cutoff (a score of 157, corresponding to an
E-value, or expected number of false positives, of 1.0), PREs
were correctly predicted at 20% (37 of 186) of experimentally defined
binding sites in Sg4 cells [10]. Lower score cutoffs gave higher
coverage of ChIP sites [8]; however, it is not clear how many of the detected PcG
binding sites in [10] contain functional PREs. Indeed, a recent ChIP-chip
analysis of transcriptional regulators in Drosophila embryos
demonstrated that many detected binding sites appear not to be functional
[2].
In addition, we predict many PREs at sites at which no ChIP enrichment was
observed [10]. These include, for example, the well-characterised
Fab-7 PRE [43]. For a selection of these
predicted sites, ChIP in other cell types (9/12 positive) and transgene analysis
(3/3 positive) have confirmed that they are indeed bona fide PRE elements and
not false-positive predictions [9,18]. The fact that these predicted and verified PREs were not all enriched in any one
cell type is consistent with the partial overlap observed between three recent
genome-wide Polycomb binding profiles (28% to 34%)
generated by ChIP or DNA adenine methyltransferase mapping (DamID) on different
D. melanogaster
cell types [8–11,15]. Other studies have also observed discrepancies between
genome-wide ChIP data and conserved cis-regulatory elements
identified by comparative genomics [2,36]. Together, these comparisons
show that neither ChIP nor computational analysis provides a comprehensive list
of all cis-regulatory elements in the genome: computational
analysis can identify sites of potential function, whereas ChIP gives a measure
of cell-type– or developmental-stage–specific deployment of
these elements. For this reason, in the present study, we combine computational
prediction of PRE elements with ChIP and transgenic analysis of specific loci. To assess the evolutionary behaviour of PREs independent of genome alignment, we
applied the algorithm to four Drosophila genomes:
D. melanogaster,
D. simulans,
D. yakuba, and
D. pseudoobscura. The algorithm was trained on
D. melanogaster
PRE sequences. Its performance on other Drosophila genomes was
confirmed by comparison of PRE predictions in the homeotic Bithorax complexes of
all four species, showing that well-characterised PREs in D. melanogaster are also
predicted with high significance at orthologous sites in the three other genomes
(Figure 1
In order to measure PRE function by independent means, we used a transgenic
reporter assay in which a PRE sequence is linked to the
miniwhite gene. The predicted bxd PRE (Figure 1 A Dynamic Scoring System Increases the Sensitivity of PRE Prediction by Using
Comparative Genomic Information For PRE prediction in a single genome, we previously used a stringent score
cutoff of 157, corresponding to an E-value (expected number of
false positives) of 1.0 [18]. This emphasis on specificity had costs for
sensitivity: with a score cutoff at 157, only 20% of sites identified
by a later ChIP study were predicted [10]. Aiming to improve sensitivity
without costs for specificity, we took steps to adapt the algorithm. We first
tested binding sites for other proteins such as DSP1 (FBgn0011764)
[44],
GRH (FBgn0259211) [45], and SP1/KLF (FBgn0020378; FBgn0040765) [46]. However, the
inclusion of these sites did not improve the predictive power of the algorithm,
but merely lowered the stringency (M. Rehmsmeier, T. Fiedler, and A.
Hauenschild, unpublished data). The original motif set [18] was thus used
for further experiments. We reasoned that the inclusion of comparative genomic data could increase the
predictive power of the algorithm. The presence of a high-scoring hit at an
orthologous or close position in a second genome would increase statistical
confidence. Thus, for the present study, we employed this principle to calculate
a sliding scale of score thresholds (Figure 2 D. pseudoobscura
Has More PREs Than Three Other Species Using this approach, we performed PRE predictions on the four genomes in all
possible pairwise combinations. In each search, the starting point was a set of
predictions that scored highly (above 157) in a single genome. Remarkably, in
these single-genome analyses, the number of predicted PREs in D. pseudoobscura (560) was over
twice that predicted in any of the other species (D. melanogaster: 201,
D. simulans:
143, and D. yakuba:
203), despite almost identical genome size [35]. To evaluate interspecies
differences in PRE number by independent experimental means, we examined the
distribution of PC by immunofluorescence on polytene chromosomes prepared from
third instar larvae of the four species. This analysis detected over twice as
many PC bands in D.
pseudoobscura as in the other three species, consistent with the
prediction of over twice as many PREs (Figure 2 The Genomic Position of PREs Is Predicted to Change Rapidly in Evolution Despite these differences in PRE number, we expected that a large proportion of
PREs would have conserved genomic position. To ascertain whether this is indeed
the case, we compared each predicted PRE in a given genome to its nearest
counterpart, identified by dynamic scoring in a second genome. For each PRE hit
in the first genome, a BLAST search was performed on the second genome, and the
distance between the BLAST hit and the nearest statistically significant PRE was
calculated (Figure 2 PREs with Constrained Position Show Motif Turnover To test these predictions experimentally, we performed ChIP on embryos from all
four species to evaluate binding of PcG proteins to predicted PRE sites in vivo.
We focused on specific examples of two classes of predicted PRE: those that have
conserved position, and those that do not. For PREs with conserved position, we
selected bxd and spalt major
(salm; FBgn0004579) as examples of PREs that have been
confirmed in D.
melanogaster [4,10,47]. ChIP analysis in embryos from
all four species demonstrated robust PcG binding to these predicted PREs
(bxd, Figure
1 The bxd and salm PREs reside in orthologous
regions in all four genomes, enabling us to ask whether the motifs that
contribute to PRE function are located in the regions of highest conservation
[20].
Unexpectedly, this was not the case (Figure 3 Furthermore, although each PRE has one or more clusters of motifs, the position
and order of motifs within the cluster is not conserved. This is most striking
in the D. melanogaster–D. pseudoobscura comparison
(red motifs, Figure 3 PREs at the trh and dpp Loci Have Changed
Position during Evolution We next selected examples of PREs that are predicted not to have conserved
position, and used ChIP and transgenic assays to evaluate PRE function of the
orthologous and nonorthologous sequences within selected loci. For this
analysis, the trachealess (trh; FBgn0003749),
decapentaplegic (dpp, FBgn0000490), and
abdominal-A (abd-A; FBgn0000014) loci were
selected (Figure 4 For dpp, the situation is more complex: there are three
predicted PRE sites, which have different scores in different species. Site 1 is
approximately 12 kb upstream of the dpp promoter, site 2 is 5
kb upstream, and site 3 is at the promoter (Figure 4 The D.
pseudoobscura Bithorax Complex Contains an Additional PRE In several cases, a PRE was predicted in one species, but had no detectable
counterpart in other species. Two such examples are shown in Figure S2
(in the unpaired 2 locus) and in Figure 4 The predicted extra D.
pseudoobscura PRE was bound by PcG proteins in D. pseudoobscura embryos (Figure 4 Genome-Wide Comparisons Predict That PREs Can Arise from Nonfunctional
Sequence The presence of an additional functional PRE in the D. pseudoobscura Bithorax
complex is intriguing, particularly since the positions of other PREs at this
locus are so well conserved. This PRE may be a remnant of an ancestral Bithorax
complex, which has lost the PRE at that position in some lineages.
Alternatively, the D.
pseudoobscura PRE may have arisen from nonfunctional sequence
and been fixed by positive selection. To evaluate these two possibilities, PRE
scores were calculated for the orthologous sequences at this position in eight
Drosophila genomes [35]. This analysis showed a
statistically significant PRE score for this site in D. ananassae, D. pseudoobscura, and
D. persimilis, but not in the melanogaster
subgroup. A maximum likelihood analysis suggests that the PRE was present in the
common ancestor of the species under consideration and was lost in the
melanogaster subgroup (Figure
S3). To gain further insight into global gain and loss of PREs during the
evolution of the D.
melanogaster lineage, we carried out genome-wide comparisons
with eight genomes as described in Materials and Methods. From this analysis, it
can be inferred that 33 PREs have been gained in D. melanogaster (Figure S4
and Table
S2). For only one of these 33 PREs, the nearest gene,
scribbled (scrib; FBgn0026178), has another
PRE, and gene CG12852 (FBgn0085383) has gained two PREs,
without having a further one. Thus, 30 of these PREs are associated with genes
that previously had no PRE. Taken together, these data indicate that PREs can
arise from nonfunctional sequence, and furthermore suggest that genes can newly
acquire PcG regulation. Discussion By using predictive methods that identify Drosophila PREs
independent of their genomic position, in combination with experimental validation
at selected loci, we document three kinds of evolutionary plasticity: the numbers of
PRE elements, their motif composition, and their genomic position all change rapidly
in evolution. By demonstrating that PRE evolution is not limited to the adaptation
of preexisting elements [22], these findings document a novel dimension of
cis-regulatory evolution. How Do PREs Change Position? For the PREs that have changed position, there are several possible mechanisms by
which a PRE may be lost from one site and gained at another, all of which may be
at play in shifting the PRE landscape between species. For example, PREs may
move by a simple microinversion event [54]. However, the evolutionary
plasticity that we document here mainly involves the loss or gain of PRE
function from orthologous sequences that do not contain inversions, thus other
mechanisms must be considered. First, PREs may move by
“creeping” from one site to the other. In this model, a
sequence adjacent to a PRE may acquire new functional motifs, thus shifting the
centre of PRE function to a slightly different location. By accumulation of such
small shifts, the PRE could effectively move to a new position. Sequence
insertions could accelerate this process. We observe such an insertion in the
salm PRE (Figure 3 Second, ancestral PREs may lose their function at different sites in different
lineages, resulting in an apparent change of position. Third, a PRE could change
its position by de novo evolution from nonfunctional sequence. We infer from
comparative genomics that this is the case for at least 35 PREs in
D. melanogaster.
It has been shown theoretically that enhancers could evolve rapidly from
nonfunctional sequence, provided that the DNA motifs are simple, and that there
is sufficient raw material in the form of “presites” that
differ from functional sites by a single nucleotide [41]. This suggests
that, as proposed [55], nonfunctional sequences may be
“elected” to take up a role as PREs by relatively few
nucleotide changes. We have examined this possibility for selected
Drosophila PREs that occur at nonorthologous positions in
different species by allowing single base changes in any motif and plotting
sites of “pre-PRE” potential. We find that sites of PRE
function in one species correspond to sites of high potential in a second
species, so that a new PRE could theoretically emerge with very few nucleotide
changes (Figure
S5). Why Do PREs Evolve So Rapidly? What is the evolutionary significance of PRE plasticity? Many studies of
enhancers have shown that small differences in sequence can lead to large
phenotypic differences [21,22,28–30], thus one may
expect the same to be true for PREs. However, it is important to bear in mind
one important functional difference between enhancers and PREs, namely that
enhancers respond to differences in cellular concentrations of the transcription
factors that bind them, whereas PREs respond to the activity state of their
cognate promoter, and not to local differences in the concentrations of the PcG
and TrxG proteins [8]. Thus, PREs may be more tolerant than enhancers to changes
in number of binding sites, and indeed to changes in the number of PREs at a
given locus. On the other hand, the only feature of enhancers that has been
studied is motif turnover. It remains to be seen whether enhancers display
evolutionary plasticity similar to that of PREs. Given the flexible nature of PRE design, we envision several possible effects of
evolutionary plasticity, which may operate differently at different PREs. First,
many differences in PRE number and sequence between species may be tolerated by
the organism without causing large phenotypic differences. Indeed, the body
plans of the different species are very similar. Thus, some PREs may work to
maintain phenotype in the face of environmental differences. For example, one of
the most important environmental constraints on different
Drosophila species from different latitudes is temperature. In
D. melanogaster,
the PcG proteins are profoundly sensitive to the temperature at which the flies
are raised [52], giving more potent silencing at higher temperatures. Thus,
for some PREs, the plasticity in design that we observe may play a role in
“buffering” the system against different temperatures, such
that the transcriptional output of the locus is conserved. In addition, PREs may
mediate phenotypic plasticity for thermosensitive traits such as pigmentation.
Several of the loci involved in the plasticity of pigmentation (e.g.,
Abd-B) are regulated by PREs [56]. On the other hand, for some PREs, differences in design may have a direct effect
on phenotype. Several studies have documented large effects on PRE function
caused by changes in one or a few binding sites [44,57,58]. Thus, we propose that some of
the changes we observe would affect the silencing or activation response of the
PRE, thus in turn affecting the level of target gene transcription that is
maintained, and giving selectable effects on phenotype. For example, one of the
major phenotypic differences between Drosophila species is the
male sex combs. The sex comb is one of the most rapidly diversifying organs in
Drosophila species, and is important for male reproductive
success [59]. Evolutionary diversity in sex comb number is associated
with diversity in regulation of the hox gene Sex-combs reduced
(Scr), which is a well-characterised target of PcG
regulation [60,61]. In D.
melanogaster, D.
simulans, and D. yakuba, a single row of
sex comb teeth is present, whereas D. pseudoobscura has two such rows. Interestingly, a
microinversion event on the 3′ side of the D. pseudoobscura
Scr locus [54] has removed 3′ regulatory sequences,
including one of a cluster of three Scr PREs, to a new
position. The D.
pseudoobscura PREs also show many sequence changes compared to
the other three species (unpublished data). Thus, differences in PRE sequence,
number, and position at the Scr locus correlate well with
phenotypic differences, and will provide an excellent model for further study of
the effects of PRE plasticity on phenotype. In summary, PREs act on several hundred genes in Drosophila,
many of which are master developmental regulators. We propose that the
extraordinary plasticity in PRE design that we observe may provide a rich
capacity for transcriptional buffering, phenotypic plasticity, and phenotypic
diversity between species. Materials and Methods Bioinformatics methods. BLAST search. The BLAST search takes a PRE predicted in one
species and determines the orthologous position in another species. Because the
PRE will usually not be conserved as a continuous sequence, multiple adjacent
high-scoring pairs (HSPs) have to be grouped together. The grouping is done
according to the following criteria: only HSPs with a BLAST
E-value not larger than 0.01 are considered. HSPs of one group
are on the same strand. The distance between adjacent HSPs of one group is below
1 kb. Groups are maximal in the sense that no HSPs can be added that fulfil
these three criteria. From all groups that correspond to one initial PRE, we
choose the one with the largest sum of HSP lengths. From several groups with the
same length sum, the one is taken that happens to be the first processed (a case
which has not occurred in our analysis so far). Starting with 201 PREs in
D. melanogaster
(version 4.0), this procedure resulted in 190 orthologous regions in
D. pseudoobscura
(version 2.0), 194 in D.
simulans (version 1.0), and 176 in D. yakuba (version 1.0). In
D. yakuba, an
additional 20 fall into “chr2L_random,” which
contains clones that are not yet finished or cannot be placed with certainty at
a specific place on the chromosome. These 20 hits were not included in our
analysis. Finding the right locus. To evaluate the validity of the BLAST
search procedure, we checked whether orthologous regions were in correct loci.
For each PRE from D.
melanogaster and its orthologous region in D. pseudoobscura, we compared
the distance between the PRE and the two genes closest to it with the distance
of the orthologous region and the two genes closest to that. If a PRE was inside
a gene, only that gene was included into the comparison. In the majority of
cases (163 out of 190), this “locus shift” is below 10 kb,
although it can become larger than 200 kb. In some cases (24), the ortholog of
the D. melanogaster
PRE and the ortholog of one of the possibly two D. melanogaster genes are found
on different chromosomes. In general, there are legitimate doubts about the
reliability of the D.
pseudoobscura gene annotation. Frequently, one or more exons are
missing, which leads to too large a distance between PRE ortholog and closest
gene in D.
pseudoobscura. Additionally, we can show that the observed rare
events of chromosome changes are consistent with the gene rearrangement in the
annotation. For example, the gene CG1924 is located on
chromosome X in D.
melanogaster and on chromosome 2 in D. pseudoobscura, whereas the
adjacent genes are on chromosome X in both species. Calculating BLAST distances (Figure 2 PRE prediction and calculation of dynamic scoring thresholds.
PRE prediction was performed using the jPREdictor software [19], which follows
the PREdictor algorithm as described in [18], except that a step size of 10
bp instead of 100 bp was used. Score cutoffs and E-values were
calculated with a nonparametric statistics on random sequence data 100 times the
size of the D.
melanogaster genome, with the D. melanogaster nucleotide
distribution (29% A, 21% C, 21% G, and
29% T). A score s such that scores of
s or better occur r times in the random data,
corresponds to an E-value of r/100 in the
single D.
melanogaster genome. For an E-value of 1, this
score cutoff is 157. For the dynamic scoring system, cutoffs were calculated
similarly, taking into account the smaller search spaces of 1 kb, 10 kb, and 20
kb radius and the fact that about 200 such searches are performed (see Figure 2 Evolutionary gain and loss of PREs. We performed a maximum
likelihood analysis of 73 D.
melanogaster PREs in eight Drosophila
genomes. Each of these 73 PREs had been genome-wide predicted, its orthologous
regions could be determined in all the other seven species, and at least one of
the other species had no functionally analogous PRE. A functionally analogous
PRE was defined as a hit predicted dynamically within a 10-kb BLAST distance.
The eight species comprise those for which the efficacy of our predictive method
has been well established (up to D.
pseudoobscura). We employed a probabilistic model whose
separate gain and loss parameters were estimated with the Mesquite software
(http://mesquiteproject.org) on the given contemporary character
states: 1 for a (functionally analogous) PRE being present in the respective
species, 0 for no such PRE being present. Subsequently, maximum likelihood
ancestral character states were reconstructed based on the estimated parameters.
Defining a D.
melanogaster PRE whose most ancestral node (the root of the
tree) has a PRE likelihood of smaller than 0.5 as being gained during evolution
resulted in 33 such PREs, listed in Table S2. Figure S4
shows the trees for the 73 PREs. Fly methods. Strains and handling. For polytene chromosomes and ChIP,
D. melanogaster
wild-type flies (Oregon R) were used. For the other species, the strains used
for whole-genome sequencing were obtained from http://stockcenter.arl.arizona.edu/. Stock numbers:
D. yakuba
14021-0261.01; D.
simulans 14021-0251.195; and D. pseudoobscura 14011-0121.94.
With the exception of D.
pseudoobscura, all species were raised on cornmeal food. For
D.
pseudoobscura, standard banana-Opuntia food was
prepared as specified at http://stockcenter.arl.arizona.edu/. Transgenics. Genomic fragments of 1.5 to 1.6 kb were amplified by PCR from genomic DNA of each
species and cloned using SpeI/NotI sites into the pUZ P-element vector upstream
of the miniwhite reporter gene [18]. Embryo injections were carried
out by Vanedis Drosophila injection service (http://www.vanedis.no). Chromosomal mapping and crosses to
PcG and trxG mutants were performed as
described [18]. Primer sequences, constructs, and transgenic fly lines are
available on request. Polytene chromosome staining. Polytene chromosomes were prepared from third instar larvae of all four species
and stained with rabbit polyclonal anti-Polycomb antibody or anti-H3K27me3
(provided by Thomas Jenuwein) as described in [62]. Western blotting. Protein extracts were made from
0–12-h-old embryos for all four species, as described in
[63].
Western blots were probed with antibodies against PC, PH, H3K27me3, or H3
(Upstate). Chromatin immunoprecipitation (ChIP). ChIP on whole embryos of
D. melanogaster,
D. simulans,
D. yakuba, and
D. pseudoobscura
was performed using anti-PC and -PH antibodies, as described [64]. Two independent
chromatin preparations on 0–16-h-old embryos, and two to four
independent ChIP assays were performed for each species. Enrichments of
immunoprecipitated DNA over input DNA were quantified by real-time PCR using
SYBR green (Sigma). Three technical replicates were performed for each primer
pair on each chromatin preparation. Primers were designed to amplify a fragment
of 100 to 300 bp within the highest scoring region of each predicted PRE (or the
minimal PRE, if known), or of the orthologous region in the species in which no
PRE was predicted. Primer sequences are available on request. Accession Numbers The FlyBase IDs for the genes and gene products mentioned in this paper are as
follows: abd-A (FBgn0000014); Abd-B
(FBgn0000015); CG12852 (FBgn0085383); dpp (FBgn0000490); DSP1
(FBgn0011764); GAF (FBgn0013263); GRH (FBgn0259211); H3 (FBgn0001199); KLF
(FBgn0040765); PC (FBgn0003042); PH (FBgn0004861); PHO (FBgn0002521);
salm (FBgn0004579); scrib (FBgn0026178);
SP1 (FBgn0020378); trh (FBgn0003749); Ubx
(FBgn0003944); upd 2 (FBgn0030904); and ZESTE
(FBgn0004050). Figure S1: The Fab-7 PRE Has Conserved Position and Shows Motif
Turnover (A) PRE prediction score plots for Fab-7 PRE at orthologous
regions of D.
melanogaster and D. pseudoobscura genomes. Grey bars below each score
plot indicate the regions shown in detail in (C). Black boxes below plots
indicate the position of PCR fragments used for real time PCR detection in
ChIP analysis. (B) ChIP enrichments of PC and PH on Fab7 PRE in
D.
melanogaster and D. pseudoobscura embryos. (C) Motif occurrence is independent of sequence conservation. The
high-scoring region of each PRE is shown. Coordinates of sequences shown
from left to right of the figure are as follows: D. melanogaster:
12725760–12724576, and D. pseudoobscura: 631891–630632. Annotation
as for Figure 3 (376 KB PDF) Click here for additional data file.(376K, pdf) Figure S2: A PRE Is Present in D.
melanogaster but Absent in D. yakuba and
D.
pseudoobscura at the unpaired2
(upd2; FBgn0030904) Locus (A) PRE prediction score plots for upd2 at orthologous
regions of D.
melanogaster, D.
yakuba, and D. pseudoobscura genomes (the D. simulans sequence for
this locus is incomplete). Coordinates of sequences shown from left to right
of the figure are as follows: D.
melanogaster: 18081000–18071000,
D. yakuba:
16868497–16858497, and D. pseudoobscura: 6996333–7006333. The
upd2 transcription unit is shown. Black boxes at the
top of plots indicate the position of PCR fragments used for real-time PCR
detection in ChIP analysis. (B) PC shows strong ChIP enrichment on predicted upd2 PRE in
D.
melanogaster, but no detectable enrichment on the
orthologous sequences in D.
yakuba and D. pseudoobscura embryos,
for which no PRE is predicted. (336 KB PDF) Click here for additional data file.(336K, pdf) Figure S3: Evolution of the Extra D.
pseudoobscura PRE in the iab3 Region of
the BX-C The D.
pseudoobscura PRE shown in Figure 4 (217 KB PDF) Click here for additional data file.(217K, pdf) Figure S4: Phylogenetic Trees for 73 PREs in Eight Species The analysis was performed as described in Materials and Methods. The
coordinates of the PRE and name of the closest gene are given. The circular
nodes at the leaves (those marked with species names) indicate absence (open
circles) or presence (solid circles) of PREs in the respective species.
Internal nodes (those that are not leaves) indicate likelihoods of
reconstructed character states, with more solid circles representing larger
likelihoods of ancestral PREs. Defining a D. melanogaster PRE whose
most ancestral node (the root of the tree) has a PRE likelihood of smaller
than 0.5 as being gained during evolution resulted in 33 such PREs, listed
in Table
S2. (730 KB PDF) Click here for additional data file.(730K, pdf) Figure S5: The D. pseudoobscura trh Locus (A) PREdictor score plot. Site 1 is predicted to be a PRE in D. pseudoobscura. Site 2 is
not predicted to be a PRE in D.
pseudoobscura, but in D. melanogaster (see Figure 4 (B) Score plot of PRE potential (“pre-PRE score”; see
Discussion). Site 2 shows strong
PRE potential, coinciding with the position of the PRE in D. melanogaster. (408 KB PDF) Click here for additional data file.(408K, pdf) Table S1: PREs Whose Positions Are Conserved in All Four Species Maxd (column 1) is the maximum distance in base pairs found between the
centres of two predicted PREs in any two genomes (see Materials and Methods). The closest annotated
D.
melanogaster gene to each PRE (column 2) and the distance in
base pairs from PRE to gene (column 3) are shown. When the PRE is within the
coding gene, this distance is given as zero. PREs of the homeotic complexes
in D.
melanogaster are indicated (column 4) using the nomenclature
of [18]. In these cases, the gene that has been shown to be
regulated by the PRE is shown. (49 KB DOC) Click here for additional data file.(49K, doc) Table S2: Gained PREs in D. melanogaster. Gained PREs were identified as described in Materials and Methods. The table
lists the 33 PREs that are inferred to have been gained in D. melanogaster using a
BLAST distance of 10 kb. The coordinates of the PRE are given, and the
closest gene and its distance to the PRE are listed. Distances of zero
indicate that PRE and gene overlap. (89 KB DOC) Click here for additional data file.(89K, doc) Acknowledgments The authors thank Betül Hekimoglu, Heidi Ehret, and Ann Mari Voie for
experimental assistance, Thomas Fiedler for bioinformatic input, and Thomas Jenuwein
for providing anti H3K27me3 antibody. Abbreviations
Footnotes Author contributions. AH, LR, and MR conceived and designed the
experiments. AH performed bioinformatic analysis. CA cloned transgenic
constructs. LR performed all other wet-lab experiments. AH, LR, and MR analyzed
the data. MR supervised AH. LR supervised CA. RP supervised LR until December
2005. LR and MR wrote the manuscript. Funding. AH was supported by the International NRW (North
Rhine-Westphalia) Graduate School in Bioinformatics and Genome Research, LR and
RP by the Deutsche Forschungsgemeinschaft and the European Union Sixth Framework
Programme Network of Excellence (EU FP6 NoE) “The
Epigenome,” LR and CA by the Austrian Academy of Sciences and the EU
FP6 NoE “The Epigenome” NET programme, and MR by the
Deutsche Forschungsgemeinschaft, Bioinformatics Initiative. Competing interests. The authors have declared that no competing
interests exist. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Curr Biol. 2007 Nov 20; 17(22):R955-7.
[Curr Biol. 2007]PLoS Biol. 2008 Feb; 6(2):e27.
[PLoS Biol. 2008]Dev Biol. 1993 Jul; 158(1):131-44.
[Dev Biol. 1993]EMBO J. 1994 Jun 1; 13(11):2553-64.
[EMBO J. 1994]Annu Rev Genet. 2004; 38():413-43.
[Annu Rev Genet. 2004]Dev Cell. 2003 Nov; 5(5):759-71.
[Dev Cell. 2003]Nucleic Acids Res. 2006 Jul 1; 34(Web Server issue):W546-50.
[Nucleic Acids Res. 2006]PLoS Biol. 2006 Jun; 4(6):e170.
[PLoS Biol. 2006]Nat Genet. 2006 Jun; 38(6):700-5.
[Nat Genet. 2006]Nat Rev Genet. 2007 Jan; 8(1):9-22.
[Nat Rev Genet. 2007]Nat Rev Genet. 2007 Mar; 8(3):206-16.
[Nat Rev Genet. 2007]Proc Natl Acad Sci U S A. 2007 May 15; 104 Suppl 1():8605-12.
[Proc Natl Acad Sci U S A. 2007]Mol Biol Evol. 2002 Jul; 19(7):1114-21.
[Mol Biol Evol. 2002]Gene. 2003 May 22; 310():215-20.
[Gene. 2003]Development. 2003 Sep; 130(17):4187-99.
[Development. 2003]Curr Biol. 2007 Nov 20; 17(22):R955-7.
[Curr Biol. 2007]Nature. 2007 Nov 8; 450(7167):203-18.
[Nature. 2007]Nature. 2007 Nov 8; 450(7167):219-32.
[Nature. 2007]PLoS Biol. 2008 Feb; 6(2):e27.
[PLoS Biol. 2008]BMC Bioinformatics. 2003 Nov 20; 4():57.
[BMC Bioinformatics. 2003]Dev Cell. 2003 Nov; 5(5):759-71.
[Dev Cell. 2003]Nucleic Acids Res. 2006 Jul 1; 34(Web Server issue):W546-50.
[Nucleic Acids Res. 2006]Dev Cell. 2003 Nov; 5(5):759-71.
[Dev Cell. 2003]Nucleic Acids Res. 2006 Jul 1; 34(Web Server issue):W546-50.
[Nucleic Acids Res. 2006]Nat Genet. 2006 Jun; 38(6):700-5.
[Nat Genet. 2006]Nat Genet. 2006 Jun; 38(6):694-9.
[Nat Genet. 2006]Dev Cell. 2003 Nov; 5(5):759-71.
[Dev Cell. 2003]PLoS Biol. 2006 Jun; 4(6):e170.
[PLoS Biol. 2006]Nat Genet. 2006 Jun; 38(6):694-9.
[Nat Genet. 2006]Nat Genet. 2006 Jun; 38(6):700-5.
[Nat Genet. 2006]Development. 2007 Jan; 134(2):223-32.
[Development. 2007]Development. 2007 Jan; 134(2):223-32.
[Development. 2007]Nat Genet. 2006 Jun; 38(6):694-9.
[Nat Genet. 2006]Curr Opin Cell Biol. 2007 Jun; 19(3):290-7.
[Curr Opin Cell Biol. 2007]PLoS Biol. 2008 Feb; 6(2):e27.
[PLoS Biol. 2008]Nature. 2007 Nov 8; 450(7167):219-32.
[Nature. 2007]EMBO J. 1994 Jun 1; 13(11):2553-64.
[EMBO J. 1994]Mol Cell Biol. 2000 May; 20(9):3187-97.
[Mol Cell Biol. 2000]Nature. 2005 Mar 24; 434(7032):533-8.
[Nature. 2005]Dev Cell. 2003 Nov; 5(5):759-71.
[Dev Cell. 2003]Dev Cell. 2003 Nov; 5(5):759-71.
[Dev Cell. 2003]Nat Genet. 2006 Jun; 38(6):700-5.
[Nat Genet. 2006]Nature. 2005 Mar 24; 434(7032):533-8.
[Nature. 2005]Mol Cell Biol. 2006 Feb; 26(4):1434-44.
[Mol Cell Biol. 2006]Nucleic Acids Res. 2005; 33(16):5181-9.
[Nucleic Acids Res. 2005]Nat Genet. 2006 Jun; 38(6):700-5.
[Nat Genet. 2006]Nature. 2007 Nov 8; 450(7167):203-18.
[Nature. 2007]Development. 2007 Jan; 134(2):223-32.
[Development. 2007]EMBO J. 1994 Jun 1; 13(11):2553-64.
[EMBO J. 1994]Nat Genet. 2006 Jun; 38(6):700-5.
[Nat Genet. 2006]Genes Dev. 2005 Mar 15; 19(6):697-708.
[Genes Dev. 2005]Int J Dev Biol. 2002 Jan; 46(1):133-41.
[Int J Dev Biol. 2002]Mol Cell Biol. 2000 May; 20(9):3187-97.
[Mol Cell Biol. 2000]Genetics. 2002 Apr; 160(4):1561-71.
[Genetics. 2002]Nat Rev Genet. 2007 Jan; 8(1):9-22.
[Nat Rev Genet. 2007]Nat Rev Genet. 2007 Jan; 8(1):9-22.
[Nat Rev Genet. 2007]PLoS Biol. 2008 Feb; 6(2):e27.
[PLoS Biol. 2008]Nature. 2000 Feb 3; 403(6769):564-7.
[Nature. 2000]Nature. 2006 Apr 20; 440(7087):1001-2.
[Nature. 2006]BMC Bioinformatics. 2003 Nov 20; 4():57.
[BMC Bioinformatics. 2003]EMBO J. 1994 Jun 1; 13(11):2553-64.
[EMBO J. 1994]Genes Dev. 1993 Aug; 7(8):1508-20.
[Genes Dev. 1993]Genetics. 1994 Mar; 136(3):1025-38.
[Genetics. 1994]Nature. 2007 Nov 8; 450(7167):203-18.
[Nature. 2007]Proc Natl Acad Sci U S A. 2007 May 15; 104 Suppl 1():8605-12.
[Proc Natl Acad Sci U S A. 2007]Genome Biol. 2006; 7(7):R67.
[Genome Biol. 2006]Mol Biol Evol. 2004 Jun; 21(6):1064-73.
[Mol Biol Evol. 2004]Genetica. 2002 May; 115(1):105-29.
[Genetica. 2002]Nat Rev Genet. 2007 Mar; 8(3):206-16.
[Nat Rev Genet. 2007]Proc Natl Acad Sci U S A. 2007 May 15; 104 Suppl 1():8605-12.
[Proc Natl Acad Sci U S A. 2007]Nature. 2005 Feb 3; 433(7025):481-7.
[Nature. 2005]Nature. 2007 Aug 2; 448(7153):587-90.
[Nature. 2007]Development. 2007 Jan; 134(2):223-32.
[Development. 2007]Genes Dev. 1993 Aug; 7(8):1508-20.
[Genes Dev. 1993]PLoS Genet. 2007 Feb 16; 3(2):e30.
[PLoS Genet. 2007]Nature. 2005 Mar 24; 434(7032):533-8.
[Nature. 2005]Development. 1999 Sep; 126(17):3905-13.
[Development. 1999]Development. 2001 Jun; 128(11):2163-73.
[Development. 2001]Behav Genet. 2008 Mar; 38(2):195-201.
[Behav Genet. 2008]Dev Biol. 2007 Nov 15; 311(2):277-86.
[Dev Biol. 2007]Nucleic Acids Res. 2006 Jul 1; 34(Web Server issue):W546-50.
[Nucleic Acids Res. 2006]Dev Cell. 2003 Nov; 5(5):759-71.
[Dev Cell. 2003]Dev Cell. 2003 Nov; 5(5):759-71.
[Dev Cell. 2003]Mol Cell. 2004 Nov 19; 16(4):641-53.
[Mol Cell. 2004]Methods Enzymol. 2004; 377():70-85.
[Methods Enzymol. 2004]EMBO J. 1998 Sep 1; 17(17):5141-50.
[EMBO J. 1998]Mol Cell Biol. 2001 Feb; 21(4):1311-8.
[Mol Cell Biol. 2001]EMBO J. 2004 Feb 25; 23(4):857-68.
[EMBO J. 2004]Dev Cell. 2003 Nov; 5(5):759-71.
[Dev Cell. 2003]