![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||
Copyright © 2007 by the Genetics Society of America Phylogenetic Footprinting Analysis in the Upstream Regulatory Regions of the Drosophila Enhancer of split Genes Department of Biology, Connecticut College, New London, Connecticut 06320 1Corresponding author: Department of Biology, Connecticut College, 270 Mohegan Ave., New London, CT 06320. E-mail: daeas/at/conncoll.edu Communicating editor: W. M. Gelbart Received January 4, 2007; Accepted July 4, 2007. This article has been cited by other articles in PMC.Abstract During Drosophila development Suppressor of Hairless [Su(H)]-dependent Notch activation upregulates transcription of the Enhancer of split-Complex [E(spl)-C] genes. Drosophila melanogaster E(spl) genes share common transcription regulators including binding sites for Su(H), proneural, and E(spl) basic-helix-loop-helix (bHLH) proteins. However, the expression patterns of E(spl) genes during development suggest that additional factors are involved. To better understand regulators responsible for these expression patterns, recently available sequence and annotation data for multiple Drosophila genomes were used to compare the E(spl) upstream regulatory regions from more than nine Drosophila species. The mγ and mβ regulatory regions are the most conserved of the bHLH genes. Fine analysis of Su(H) sites showed that high-affinity Su(H) paired sites and the Su(H) paired site plus proneural site (SPS + A) architecture are completely conserved in a subset of Drosophila E(spl) genes. The SPS + A module is also present in the upstream regulatory regions of the more ancient mosquito and honeybee E(spl) bHLH genes. Additional transcription factor binding sites were identified upstream of the E(spl) genes and compared between species of Drosophila. Conserved sites provide new understandings about E(spl) regulation during development. Conserved novel sequences found upstream of multiple E(spl) genes may play a role in the expression of these genes. REGULATION of gene expression is central to the process of cell differentiation during development. Multiple levels of regulation allow tight control of gene activation that results in specific patterns of expression and cell-type specification. At one level, gene expression patterns are controlled by proteins that bind to specific DNA sequences in the regulatory regions of genes and either activate or repress transcription. It is becoming increasingly clear that transcriptional regulation often depends on the interaction between multiple proteins and that the cellular context determines which players will be available to bind and affect transcription (Levine and Tjian 2003). Transcription factor binding sites clustered together in specific orientations are termed modules. These modules coordinate protein–protein interactions that result in activation or repression of gene expression (Halfon et al. 2002). The Drosophila Enhancer of split [E(spl)] genes present an interesting study of complex modular transcriptional regulation. The E(spl) locus contains 13 different genes: HLHm3, HLHm5, HLHm7, HLHm8, HLHmβ, HLHmγ, and HLHmδ encode a set of basic-helix-loop-helix (bHLH) transcriptional repressors; mα, m2, m4, and m6 code for proteins that are members of the Bearded family; m1 codes for a putative protease inhibitor; and groucho codes for a transcriptional corepressor (Delidakis and Artavanis-Tsakonas 1992; Knust et al. 1992; Paroush et al. 1994; Lai et al. 2000). Importantly, many of the E(spl) genes in Drosophila melanogaster share the same transcription factor binding sites in their promoter regions yet show different patterns of expression during development, suggesting that additional differential coregulators are involved (De Celis et al. 1996; Nellesen et al. 1999; Wech et al. 1999). A subset of the E(spl) genes is regulated by a complex of regulatory proteins bound to a specific module containing binding sites for Suppressor of hairless [Su(H)] and proneural proteins (Nellesen et al. 1999; Cave et al. 2005). Expression of the majority of the E(spl) genes is activated by Notch signaling (Jennings et al. 1994; Bailey and Posakony 1995; Eastman et al. 1997; Lai et al. 2000). The Notch pathway is conserved from worms to humans and regulates cell fate decisions in a wide variety of tissues (reviewed in Artavanis-Tsakonas et al. 1999; Baron 2003). The significance of the Notch pathway has been underscored by the discovery that altered forms of pathway members, including the E(spl) genes, cause a variety of human diseases including Alzheimer's, CADASIL, Alagille's syndrome, neoplasia, and neuroblastoma (reviewed in Joutel and Tournier-Lasserve 1998; Axelson 2004a,b). Notch codes for a transmembrane protein, which acts as the receptor in the signaling pathway. Notch and its ligands undergo post-translational events that are essential for proper signaling (reviewed in Chan and Jan 1998; Haines and Irvine 2003). The outcome of these events is the release of the intracellular domain of Notch (known variously as Nact, ICN, or NICD), which acts as a transcriptional activator (Struhl and Adachi 1998) and activates gene expression in combination with Su(H). In the absence of the Notch intracellular domain, Su(H) interacts with corepressors such as Hairless, dCtBP, Groucho, and SMRTR, to inhibit transcription (Morel et al. 2001; Barolo et al. 2002; Tsuda et al. 2002; Nagel et al. 2005). When Notch is activated, it binds to Su(H) and transforms it from a negative regulator to a positive regulator (Furriols and Bray 2001; Morel et al. 2001). Although NICD coupled with Su(H) induces the expression of many of the members of the E(spl)-C (Jennings et al. 1994; Bailey and Posakony 1995; Eastman et al. 1997; Lai et al. 2000), they are most likely not the only regulators of these genes. The distinct expression patterns of the different E(spl) genes, combined with the finding that activated Notch and Suppressor of Hairless are capable of eliciting only limited transcription of the genes (Cooper et al. 2000), indicate a requirement for additional transcriptional regulators that direct expression of the E(spl) genes (De Celis et al. 1996; Wech et al. 1999). Cell-specific activation of some of the E(spl) genes requires interaction between NICD and proneural proteins (Cooper et al. 2000). Synergistic transcriptional activation by these proteins requires a specific organization of the upstream regulatory regions of the target genes (Cave et al. 2005). Since a specifically oriented pair of Su(H) binding sites [Su(H) paired site, SPS] and a bHLH activator binding site (A) are necessary for Notch–proneural synergy it has been proposed that the orientation and/or conformation of Su(H) bound to these sites allows direct interaction with Daughterless, a bHLH activator protein that interacts with Achaete–Scute proneural proteins (Cave et al. 2005). Only a subset of E(spl) promoters contains the SPS + A architecture, suggesting that other combinatorial factors may bind to different modules in the promoters of the E(spl) genes and direct their distinctive expression patterns during development (Nellesen et al. 1999; Cave et al. 2005). To better understand the complex regulation of the E(spl) genes we analyzed the upstream regulatory regions of these genes using bioinformatic and phylogenetic footprinting approaches. The principle of phylogenetic footprinting is based on the fact that functional sequences tend to evolve at a slower rate than nonfunctional sequences. Thus sequences that are conserved in multiple and distantly related species are more likely to be functional than nonfunctional (Bergman et al. 2002). Recently, the fully sequenced, assembled, and annotated genomes of more than nine different Drosophila species have been released (Crosby et al. 2007). We have used multiple DNA alignment and scanning tools to compare and analyze the E(spl) regulatory regions in these different species. We have identified multispecies conserved sequences (MCSs) that relate to known functional binding sites including Su(H), proneural, and bHLH repressor sites. We have also discovered additional transcription factor binding sites and novel shared sequences that are conserved in nine different Drosophila species. These results provide insights into the evolution of the E(spl) locus and information about additional factors that regulate these genes. MATERIALS AND METHODS Reference DNA sequences: D. melanogaster reference DNA for the E(spl) genes was obtained from NCBI entrez, http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi, or from the University of California Santa Cruz (UCSC) Genome Browser (Kent 2002), http://genome.ucsc.edu/cgi-bin/hgGateway (Assembly April 2004). The following promoter fragments were used as reference D. melanogaster sequences: m1, 1500 bp (Wurmbach et al. 1999; UCSC Genome Browser); m2, 1500 bp (Wurmbach et al. 1999; UCSC Genome Browser); HLHm3, 1700 bp (Nellesen et al. 1999); m4, 1384 bp (Singson et al. 1994); HLHm5, 1500 bp (Nellesen et al. 1999; UCSC Genome Browser), HLHm6 (Wurmbach et al. 1999; UCSC Genome Browser); HLHm7, 1700 bp (Singson et al. 1994; UCSC Genome Browser); HLHm8, 1500 bp (Klambt et al. 1989); mα, 1341 bp (Nellesen et al. 1999); HLHmβ, 1025 bp (Nellesen et al. 1999); HLHmγ, 1200 bp (Delidakis and Artavanis-Tsakonas 1992); and HLHmδ, 2000 bp (Nellesen et al. 1999; UCSC Genome Browser). Reference sequences from all other Drosophila species were accessed from FlyBase (Grumbling and Strelets 2006) using FlyBase BLAST (http://flybase.bio.indiana.edu/blas) and GBrowse (http://flybase.bio.indiana.edu/cgi-bin/gbrowse/dmel/). Reference sequences for Apis mellifora and Anopheles gambiae E(spl) promoters were obtained from the Ensembl browser (Hubbard et al. 2005) (http://www.ensembl.org) and the UCSC Genome Browser (A. mellifera, Assembly January 2005 and A. gambiae, Assembly February 2003). BLAT and EvoPrinter analysis: D. melanogaster reference sequences for each of the E(spl) promoters were pasted into the BLAT search engine window (http://genome.ucsc.edu/cgi-bin/hgBlat) and individually compared to eight different test species: D. simulans (Assembly April 2005), D. yakuba (Assembly November 2005), D. erecta (Assembly August 2005), D. ananassae (Assembly August 2005), D. pseudoobscura (Assembly November 2004), D. virilis (Assembly August 2005), D. mojavensis (Assembly August 2005), and D. grimshawi (Assembly August 2005). Percentages of identities between species were determined by calculating the number of conserved sites from the highest-scoring BLAT alignment outputs. For multispecies alignments, the highest-scoring BLAT readout alignment for each test species was selected and pasted into an EvoPrinter (Odenwald et al. 2005) input window. (http://evoprinter.ninds.nih.gov). EvoPrinter outputs (EvoPrints) were generated using subsets of the BLAT inputs as well as BLAT readouts from all of the test species. On the basis of the Drosophila phylogenetic tree, species were sequentially added into EvoPrinter in the following order: D. simulans, D. yakuba, D. erecta, D. ananassae, D. pseudoobscura, D. virilis, D. mojavensis, D. grimshawi. We also used EvoDifference to identify sequences that are conserved in all but one of the above species. Transcription factor binding site identification: Su(H) binding sites, bHLH repressor sites, bHLH activator/proneural sites, and TATA boxes were identified in the Drosophila EvoPrint readouts for each additive species. The presence or absence of these sites was then confirmed by analyzing individual BLATs or by searching for the sites in the individual species sequences obtained from FlyBase (http://flybase.bio.indiana.edu/blast/). Identification of Su(H) and proneural binding sites in A. mellifera and A. gambia was done by scanning the promoters for consensus sequences. Additional transcription factor sites in Drosophila were first identified in the D. melanogaster reference sequences using MatInspector (http://www.genomatix.de/cgi-bin/matinspector_prof/mat_fam.pl). Identified sites were analyzed for conservation in eight other Drosophila species by analyzing BLAT readouts. cis-Decoder analysis: Shared conserved sequences between the E(spl) promoters were identified using the cis-Decoder programs, EvoPrint-Parser, CSB-aligner, and cDT-scanner programs (Brody et al. 2007; http://evoprinter.ninds.nih.gov/cisdecoder). EvoPrints for each promoter were parsed using EvoPrint parser to identify conserved sequence blocks (CSBs) that were ≥6 bases. To identify shared elements between E(spl) promoters, the parsed outputs were aligned using CSB-aligner. The outputs, cis-Decoder tags (cDTs), were scanned by hand to identify which cDTs were present in multiple E(spl) promoters. cis-Decoder:cDT-scanner was used to identify cDTs in the E(spl) sequences that are also found in promoters of genes known to be expressed in neuronal or mesodermal tissues during development. RESULTS Conservation in the upstream regulatory regions of the E(spl) genes: To determine the general level of conservation within the upstream regulatory regions of the E(spl) genes we used the BLAST-like alignment tool (BLAT) (Kent 2002) to index D. melanogaster E(spl) promoter region sequences and scan orthologous sequences from eight other Drosophila species for matches. BLAT sequence comparisons of these nine different species of Drosophila show a range of conservation in the upstream regulatory regions of the different E(spl) genes (Table 1). We used the percentage of sequence identities between species to obtain a general indication of how many bases have been conserved between species during evolution. As expected, more closely related species generally have a higher percentage of identities for the E(spl) gene promoters. The m2, HLHmβ, m4, and HLHmγ genes have the most highly conserved promoters between D. melanogaster and the distantly related species D. mojavensis, with 41.4, 40.6, 36.4, and 35.0% identities, respectively. In fact, these genes were the most conserved in almost all the species we analyzed. Furthermore, they show the greatest conservation when all nine species are compared together using a recently available multigenomic comparative tool, EvoPrinter (Odenwald et al. 2005). The EvoPrinter algorithm overlays multiple BLAT readouts from species that are all aligned with a common reference sequence. The EvoPrinter output reveals sequences that are invariant in multiple species. We did observe several exceptions: HLHmγ, m4, and HLHm3 show higher identity between D. melanogaster and D. pseudoobscura, 62.3, 58.6, and 37.7%, respectively, than the more closely related D. melanogaster and D. anannasse, 58.1, 53.8, and 32.5%, respectively.
Further analysis of the sequence comparisons using EvoPrinter reveals interspersed regions referred to as MCSs (Odenwald et al. 2005), which have been conserved in the promoters of the E(spl) genes. Twelve conserved regions with sizes between 10 and 48 bp are found in both of the HLHmγ and HLHmβ promoters (Figure 1, A and B
These results show that there are differences in the levels of conservation in the promoters of the different E(spl) genes, which may shed light on the evolution of these regulatory regions. In addition, specific sequences within these promoter regions have been conserved in at least nine different species of Drosophila, suggesting that they may play important functional roles in regulating the expression of these genes. Su(H) site conservation in the E(spl) promoter regions: Since three classes of Su(H) binding sites, high-affinity (YGTGRGAA) single, high-affinity paired, and low-affinity (RTGRGAR) single, have already been identified in the D. melanogaster E(spl) genes (Bailey and Posakony 1995; Eastman et al. 1997; Nellesen et al. 1999), we were interested in determining whether these sites are preferentially conserved in other Drosophila species. Although paired high-affinity sites have been shown to play a role during sensory organ precursor (SOP) formation (Nellesen et al. 1999; Cave et al. 2005), single high-affinity site and low-affinity site functionality during Drosophila development has not been as well characterized. We analyzed the E(spl) promoters to determine the level of conservation of all three types of Su(H) binding sites throughout nine species of Drosophila (Table 2). Approximately 70% (23 of 33) of the high-affinity Su(H) binding sites have been completely conserved in the E(spl) bHLH promoters whereas only 13% (3 of 23) of the low-affinity sites are conserved. Twelve of the 23 conserved high-affinity sites are part of a paired site.
Paired Su(H) binding sites are defined as two high-affinity Su(H) binding sites of a specific type, YGTGRGAAM, where the “Y” is a T in the upstream site and a C in the downstream site and M is an A or C, spaced 30 bp apart in an inverted repeat arrangement (Nellesen et al. 1999). All paired Su(H) sites found in D. melanogaster E(spl) promoter regions (m4, HLHm7, HLHm8, HLHmγ, HLHmδ, and HLHm3) are completely conserved in eight other Drosophila species (Figure 2
Although many D. melanogaster E(spl) bHLH promoter regions contain single high-affinity and low-affinity binding sites (Nellesen et al. 1999; Cave et al. 2005), we found that very few of them are conserved (Table 2). HLHmγ and HLHmβ are the only E(spl) bHLH genes with conserved single high-affinity Su(H) sites and HLHmγ is the only gene to have a conserved low-affinity site. mα, m2, m4, and m6, which are Su(H) dependent, and Notch responsive non-bHLH genes at the E(spl) locus (Nellesen et al. 1999; Wurmbach et al. 1999; Lai et al. 2000), have conserved high-affinity and low-affinity Su(H) binding sites (Nellesen et al. 1999). In addition to analyzing the conservation of D. melanogaster Su(H) sites we determined whether nonconserved sites were present upstream of one bHLH gene, HLHmβ, and one Bearded-like gene, m4, in the same set of Drosophila species. The total number of Su(H) sites, not just those first identified in D. melanogaster, was identified for m4 and HLHmβ in nine species of Drosophila (supplemental Table S12 at http://www.genetics.org/supplemental/). In m4 only the conserved D. melanogaster sites are present in all the other species. However, in HLHmβ, which contains only one high-affinity site and no low-affinity sites in D. melanogster, one low-affinity site is present in D. anannasse and D. virilis and two low-affinity sites are present in D. mojavensis. Although no expression data are yet available for the E(spl) genes in these species, it is tempting to speculate that the extra low-affinity sites allow either for enhanced expression or for different patterns of expression in these species. Overall, high-affinity Su(H) binding sites, particularly in the paired configuration, are highly conserved throughout the Drosophila genus, further supporting their critical roles in regulating E(spl) expression. There are fewer conserved single high-affinity and low-affinity sites; however, those that are conserved may very well be functional and regulate several of the E(spl) genes via a different mechanism than the paired sites. bHLH activator and repressor site conservation in the E(spl) promoters: Proneural bHLH activator proteins play a role in the regulation of E(spl) transcription and almost all of the E(spl) promoters in D. melanogaster contain binding sites (class A E boxes) for these proteins (Heitzler et al. 1996; Cooper et al. 2000; Li and Baker 2001). Several of these sites have already been shown to be functional and necessary for expression of the E(spl) genes in proneural clusters (Cooper et al. 2000). Two different types of proneural binding sites (class A E boxes) with consensus sequences of GCAGSTG and AWCAKGTG are preferentially bound by Achate–Scute (Ac/Sc) (Singson et al. 1994) and Atonal (Powell et al. 2004) proteins, respectively. We examined the presence of both types of bHLH activator sites first in D. melanogaster and then in nine species of Drosophila to determine whether they were conserved. In D. melanogaster, all of the proneural binding sites upstream of the E(spl) bHLH genes are of the Ac/Sc type. Atonal sites are only found upstream of m2 and m4 (Table 3). In all of the E(spl) promoters that contain proneural binding sites in D. melanogaster, except for HLHmβ, at least one site is conserved in all species (Table 3). In addition to the core consensus sequences, the flanking sequences are also highly conserved (supplemental Figure S1 at http://www.genetics.org/supplemental/). Although proneural sites were analyzed independently from the SPS module, many of the conserved sites were found to be located relatively close to the SPS sites (Figure 3
The upstream regulatory regions of all of the E(spl) genes also contain bHLH repressor binding sites, and it has been postulated that they act to allow cross-regulation of these genes by one another (Nellesen et al. 1999; Cave et al. 2005). Three classes of bHLH repressor sites (N box, class C E box or Hairy site, and class B E box) are present, although none of these sites have yet been shown to be required for any functions in vivo. We examined these sites across 10 Drosophila species and identified at least one conserved bHLH repressor site in the promoters of all of the E(spl) genes, except for HLHm3 (Table 4). Conserved N boxes, which have low-affinity binding by E(spl) bHLH repressors (Jennings et al. 1999), are present upstream of m2, HLHm7, HLHm8, HLHmβ, HLHmγ, and HLHmδ. Class B E boxes, which are bound with higher affinity by E(spl) bHLH proteins (Jennings et al. 1999), are conserved only in m4 and HLHmδ. The most frequently occurring site in D. melanogaster is the N box; however, only 27% (7/26) of these sites are completely conserved in 8 other species whereas 83% (5/6) of the more rare class B E box sites are conserved. By comparison 33% (3/9) of the class C E box/Hairy sites are conserved.
Although nearly all of the E(spl) gene promoters contain at least one conserved bHLH repressor type site, the sites are not as well conserved as the proneural sites. Further analysis of the total number of bHLH repressors upstream of m4 and HLHmβ suggests that additional sites, some in different locations and some of different classes (supplemental Table S12 at http://www.genetics.org/supplemental/), may either supplant the sites that are not conserved or allow for variable expression patterns in different species. SPS + A architecture is conserved across nine Drosophila species: The SPS + A [paired Su(H) sites coupled with a proneural site] architecture has been shown to be critical for Notch–proneural synergistic induction of transcription (Nellesen et al. 1999; Cave et al. 2005). The spatial organization of these sites is in concordance with the direct interaction between Su(H) and proneural proteins (Cave et al. 2005). We found that this overall organization is conserved in all nine species of Drosophila that we analyzed (Figure 3A In summary, the overall conservation and organization of the SPS + A architecture in nine different Drosophila species further supports its role as an important transcriptional module. It is tempting to speculate that the differences in proneural site location and number upstream of HLHm7 may reflect some constraint in the Su(H) proneural synergistic mechanism since HLHm7 is the only E(spl) gene with a paired site that does not have a proneural site very close to the paired Su(H) sites. SPS + A architecture is conserved in honeybee and mosquito E(spl) gene promoter regions: To further understand the evolution of the SPS + A architecture in Drosophila dipterans, we analyzed the E(spl) upstream regulatory regions in a more ancient Dipteran, the mosquito A. gambiae and a Hymenopteran, the honey bee A. mellifera. A. gambiae has one E(spl) bHLH gene, HLHmβ/γ and one Bearded gene while A. mellifera has three E(spl) bHLH genes, HLHmγ, HLHmβ, and HLHmβ′, and a single Bearded gene (Schlatter and Maier 2005). The A. gambiae and A. mellifera E(spl) bHLH gene products appear to be most closely related to the HLHmβ and HLHmγ pair of D. melanogaster (Schlatter and Maier 2005). Unlike the Drosophila HLHmβ promoter, we identified paired Su(H) sites and proneural sites in the E(spl) HLHmβ promoters from both honeybee and mosquito (Figure 4A
In addition to paired sites, we also analyzed the promoters for single Su(H) sites, proneural sites, and bHLH repressor sites. In A. gambiae HLHmβ/γ one low-affinity Su(H) site, one class C E box/Hairy site, and two N boxes were identified (supplemental Figure S2, A and B, at http://www.genetics.org/supplemental/). The A. mellifera HLHγ promoter contains six N boxes and three class C E box/Hairy sites (supplemental Figure S2, C and D); the HLHmβ promoter contains one low-affinity Su(H) site, one class C E box/Hairy site, and one N box (supplemental Figure S2, E and F); and the HLHmβ′ promoter contains three low-affinity Su(H) sites, one proneural Atonal site, and two class C E box/Hairy sites (supplemental Figure S2, G and H). The presence of these sites in many species of Drosophila as well as in the more ancient A. mellifera and A. gambiae suggests that they play an important role in the regulation of the E(spl) bHLH genes. Identification of putative transcription factor binding sites: Although Su(H) and proneural binding sites have been shown to be critical for the regulation of E(spl) expression, many lines of evidence suggest that there may be additional factors involved (Ligoxygakis et al. 1999; Nellesen et al. 1999; Wech et al. 1999; Cooper et al. 2000; our unpublished observations). The program MatInspector (Cartharius et al. 2005; Quandt et al. 1995) was used to identify transcription factor binding sites upstream of the E(spl) genes in D. melanogaster. MatInspector identifies putative binding sites and infers potential by scanning input sequences for matches to a library based on position weight matrices (PWM). The matrix library also contains information about the specificity and sensitivity of each nucleotide within all of the included consensus transcription factor binding sites. MatInspector calculates matrix similarity scores that range from 1 (sequence corresponds to most important nucleotides within a matrix site) to 0 (no correspondence to matrix site). Matrix similarity scores that are less than 0.8 are rejected as potential sites. Using MatInspector, we identified potential transcription factor bindings sites in the D. melanogaster E(spl) promoters (Table 5). Paired homeodomain sites are present in all of the promoters, whereas Fushi tarazu, Snail, Dorsal, Zeste, and CF2-II sites are found in the majority of the promoters. We further analyzed the possible functionality of these sites by determining whether they were conserved in other Drosophila species. BLAT analysis revealed conservation of specific sites between D. melanogaster and individual Drosophila species (supplemental Tables S1–S11 and supplemental Figure 1 at http://www.genetics.org/supplemental/). We specifically searched for conserved sites in the promoter regions of the E(spl) bHLH genes to identify putative regulators that may be responsible for the distinctive expression patterns of these genes (Figure 5
Shared novel sequences between Drosophila E(spl) promoter regions: In addition to known transcription factor binding sites, previously uncharacterized sequences common to multiple E(spl) genes may play a role in the overlapping expression patterns observed for these genes. To identify such sequences programs called EvoPrint-parser and cis-Decoder:CSB-Alignment (Brody et al. 2007) were used to scan the conserved regions in the promoters of the E(spl) bHLH and m4 genes for sequences present in more than one of the genes. EvoPrint-parser identifies CSBs, which are defined as continuous strands of conserved bases that average ~13 bp in length (Brody et al. 2007). cis-Decoder:CSB-Alignment was then used to compare the CSBs from all of the E(spl) genes and to identify sequences shared between two or more genes. The shared sequences identified by cis-Decoder:CSB-Alignment are called cDTs. More than 100 cDTs shared between all E(spl) promoter regions were identified (data not shown). When the analysis was focused to the E(spl) bHLH genes and the m4 gene, as a representative of Bearded-like E(spl) genes 49 cDTs between 6 and 10 bp in length were found to be present in at least two of the E(spl) genes (Table 6). The sequence CAATAA is present upstream of HLHm3, HLHm5, HLHm8, HLHmβ, and HLHmγ and the sequence TAATTG is present upstream of HLHm3, m4, HLHm5, HLHm8, and HLHmβ. GGATCG is upstream of three E(spl) genes: HLHm3, HLHmβ, and HLHmγ. Other possible coordinating sequences may be CACACG, which is found in front of HLHm3, HLHmγ, and HLHmδ, and AAATCT, which is upstream of HLHm3 and HLHm7. Some cDTs are present multiple times in individual promoters (data not shown).
Drosophila tissue-specific cDT libraries that contain cDTs common to genes known to be expressed in particular tissues have been generated and are publicly accessible (Brody et al. 2007). The alignment program cDT-scanner was used to identify E(spl) bHLH cDTs that are also present in known neural or mesodermal expressed Drosophila genes. Many of the cDTs identified upstream of the E(spl) genes are also present in neural or mesodermal enhancers (Table 6). Interestingly, some of the cDTs appear to be specific to E(spl) promoter regions and are not found in mesodermal or neuronal regulatory regions. In fact, the sequence GGATCG described above appears to be E(spl) specific. Given the conservation in nine Drosophila species and presence in multiple E(spl) genes the conserved sequences identified using cis-Decoder are good candidates for regulatory transcription factor binding sites. In addition to identifying shared novel sequences between the E(spl) genes, cis-Decoder analysis also revealed the presence of shared conserved sequences flanking Su(H) sites. For example, the high-affinity Su(H) site GTGG/AGAA is flanked at the 3′ end by a conserved AC in HLHmγ, HLHm5, HLHm3, HLHm7, and HLHm8. The reverse-oriented Su(H) site TTCTCAC is flanked at the 3′ end by conserved AT bases in m4, HLHm5, and HLHmδ while in HLHmγ and HLHmβ a conserved G flanks the site. Brody has also identified conserved flanking sequences that are shared among Su(H) sites in subsets of E(spl) genes (T. Brody, personal communication). These flanking sequences may indeed play critical roles in stabilizing different complexes bound to the Su(H) sites, thus playing a role in the differential expression patterns of the different E(spl) genes. DISCUSSION This comparative study of the upstream regulatory regions of the E(spl) genes in multiple species of Drosophila shows that sequences known to play critical functional roles in D. melanogaster during development have been completely conserved during Drosophila evolution. These results not only highlight the functional importance of these sites, but also shed light on less well understood sites and on the evolution of the E(spl) genes. Importantly, this analysis identified uncharacterized conserved sequences from all of the E(spl) bHLH promoters. These sequences may very likely play important functional roles in the regulation and diverse patterning of E(spl) gene expression. Conservation of Su(H) binding sites: One of the most striking findings from this study is that all the paired Su(H) sites identified in D. melanogaster E(spl) promoters are completely conserved, both in sequence and in orientation, in nine Drosophila species as well as in A. gambiae and A. mellifera. In addition, at least one proneural site is completely conserved, maintaining the SPS + A architecture. These results support previous findings showing that the synergistic effect of activated Notch and proneural proteins on E(spl) expression depends not just on the presence of Su(H) and proneural binding sites (Nellesen et al. 1999), but also on the correct orientation of paired Su(H) binding sites (Cave et al. 2005). Cave et al. (2005) have proposed a model for Notch proneural synergistic activation that is dependent upon protein–protein interactions between Su(H) and proneural proteins. This model predicts that the proper orientation of Su(H) sites allows bound Su(H) to interact cooperatively with proneural proteins and that together they recruit NICD and/or enable activation by NICD. This model also predicts that proneural binding sites must be nearby the paired Su(H) sites. Results shown here confirm this since in all the E(spl) genes with an SPS + A architecture, except HLHm7, the conserved A site is <60 bp away from the SPS site in nine Drosophila species. The spacing of the paired Su(H) and proneural sites may allow ideal interaction conditions for the bound proteins and variations of this organization, such as that seen for HLHm7, may allow for differences in expression patterns between species. Recently, Zinzen et al. (2006) have suggested that shifted binding site spacing and organization in the upstream regulatory region of single-minded (sim) may result in varied expression patterns in the embryonic ventral midline in different species of Drosophila. The sim regulatory regions contain the same quantity and quality of single Su(H) and Twist binding sites in three different Drosophila species. However sim expression in D. melanogaster appears to be regulated by Notch and Twist together, whereas in D. pseuodoobscura and D. virilis the expression pattern suggests regulation by Notch alone. Unfortunately E(spl) expression data are currently limited for Drosophila species other than D. melanogaster and consequently we cannot yet make direct correlations between binding sites and expression patterns for these genes. However, the binding site analysis reported here has the potential to reveal important insights into the variations of E(spl) expression that may be seen in different species. Interestingly, the required Notch-responsive Su(H) binding sites found upstream of sim are single high-affinity sites rather than paired sites (Cowden and Levine 2002; Zinzen et al. 2006). The few conserved single high-affinity sites in the E(spl) bHLH genes are present upstream of HLHmβ and HLHmγ. Further experiments in D. melanogaster and other Drosophila species are necessary to determine whether these sites are functional and whether expression of the HLHmβ and HLHmγ genes involves Su(H)/NICD bound to these sites alone or in coordination with other transcription factors. Low level of bHLH repressor site conservation: Although bHLH repressor binding sites have previously been identified in the E(spl) promoter regions, we are not aware of any critical analysis of the possible functionality of these sites before this study. It was originally proposed that bHLH repressor proteins block proneural bHLH protein activity by binding to E boxes (proneural sites) thereby inhibiting proneural DNA binding or by binding to N boxes [E(spl) bHLH sites] and recruiting corepressors to block the transcriptional activator function of the proneural proteins (Knust et al. 1992). Mutagenesis studies of N boxes in the scute promoter (Culi and Modolell 1998) and experiments showing that direct binding of E(spl) bHLH repressors to the C terminus of a proneural protein blocks its ability to activate expression (Giagtzoglou et al. 2005) suggest that direct DNA binding by the E(spl) bHLH repressors is not necessary for function. Our results support this as well, since we show that few bHLH repressor sites are conserved in the promoters of the E(spl) genes. Some of the sites that are conserved are those that overlap or are very near to the Su(H) paired sites in the SPS + A modules. As previously suggested by Culi and Modolell (1998), DNA binding by bHLH repressors to these sites may help to stabilize their interactions with proneural proteins and thus more effectively inhibit proneural activity. Evolution of the E(spl) regulatory regions: Homologs of the E(spl) sequences have been identified in many species, from mosquitos to humans; however, only in Drosophila is such an expansion of genes at the locus present (Maier et al. 1993; Schlatter and Maier 2005). It has been postulated that the presence of multiple genes at the Drosophila E(spl) locus was a result of duplications that not only enlarged the locus, but also allowed for diversification of function (Nellesen et al. 1999; Schlatter and Maier 2005). It is possible that the regulatory regions were not part of this duplication, but rather evolved independently. Our data argue against this, demonstrating the presence of the conserved SPS + A architecture in multiple E(spl) promoters in Drosophila species, A. mellifera, and A. gambiae as well as in the mouse Hairy/E(spl) (HES-1) promoters (Figures 3 Studies in vertebrates suggest that there are several possible outcomes when gene duplication occurs (Walsh 2003). These include degenerative mutations with loss of one of the copies, an acquired novel function for one copy, or division of specializations from the original gene into each copy resulting in complementary functions or subfunctionalization (Force et al. 1999). The presence of paired sites in the more ancient E(spl) bHLH promoters of A. mellifera and A. gambiae and subsequent loss in some of the Drosophila promoters suggest that at least some subfunctionalization occurred. Schlatter and Maier (2005) have proposed HLHmβ as the ancestral E(spl) bHLH gene since the single E(spl) bHLH gene product in A. gambiae shares the highest amino acid identity with Drosophila HLHmβ (75.7% compared to 67.4% with Drosophila HLHmγ). Our results support this since HLHmβ also showed higher promoter region sequence identity between multiple Drosophila species than HLHmγ. The presence of single high-affinity Su(H) sites upstream of only HLHmβ and HLHmγ provides further evidence for their ancient status and more general roles during development. Our identification of SPS + A modules upstream of the single A. gambiae E(spl) bHLH gene and the putative HLHmβ and HLHmγ genes in A. mellifera could suggest that HLHmβ may not be the progenitor gene since the Drosophila HLHmβ promoter region does not contain the SPS + A module. However, if subspecialization after duplication did occur it could be expected that the progenitor gene would lose specific sites retained by the new gene copy. Thus, we postulate that HLHmβ was indeed the progenitor gene and that HLHmγ, which does contain an SPS +A module as well as single high-affinity Su(H) and low-affinity Su(H) sites in its promoter region in Drosophila, originated from the first duplication and subspecialization. Conserved E(spl) promoter sequences may be sites for regulatory transcription factors: One of the critical questions in the understanding of the E(spl) genes is how their diverse expression patterns are achieved during development. Identification of conserved transcription factor binding sites and shared novel sequences in the promoters of the E(spl) genes presented in this study could elucidate the underlying molecules and mechanisms responsible. The high level of conservation of known functional sites for Su(H) and the proneural proteins suggests that other conserved sites may also play a role. Characterization of these sites and the proteins that bind to them may help uncover previously unidentified regulators of the E(spl) genes. Further confidence in the role of these conserved sites comes from the identification of Dorsal binding sites upstream of HLHm7, which has been shown to be expressed in the presumptive mesectoderm and to have altered expression patterns or levels in the mesectoderm in dorsal loss-of-function mutants in D. melanogaster (Gonzalez-Crespo and Levine 1993; Wech et al. 1999). The upstream regulatory regions of HLHm7 and HLHm8 each have one Dorsal site that is conserved in nine species of Drosophila and a second site that is partially conserved. Interestingly, conserved Dorsal sites are located immediately adjacent to the Su(H) paired sites upstream of HLHm7 and HLHm8. Studies on the role of Dorsal and Notch signaling on the regulation of sim and HLHm8 expression suggest that sim requires Dorsal for Notch activation, whereas HLHm8 does not (Cowden and Levine 2002). The presence of paired Su(H) sites upstream of HLHm8 and single sites upstream of sim may allow HLHm8 to be more sensitive to activation by Notch. It is possible that if low levels of activated Notch are present Dorsal could assist in HLHm8 expression. Two other E(spl) bHLH genes, HLHmγ and HLHmδ are identically expressed in a subset of proneural clusters within wing imaginal discs (De Celis et al. 1996). Although both genes have conserved SPS + A modules in their promoters and their expression levels are affected by a loss of Su(H), their expression pattern in wing discs appears to be Su(H) independent (Nellesen et al. 1999). Thus, other regulatory transcription factors must be involved in their expression. It has previously been shown that a 234-bp promoter fragment of HLHmγ is sufficient to induce the wild-type expression pattern (Nellesen et al. 1999). Two of the shared conserved sites we identified in the HLHmγ and HLHmδ promoter regions, RCAATR and GAAAGT, are found within the 234-bp functional HLHmγ fragment. This analysis highlights the possible importance of these sites and directs future experiments. Accordingly, we plan to use site-directed mutagenesis to alter these sites to determine their involvement in regulating HLHmγ and HLHmδ expression in the wing disc. HLHm3 is the most broadly expressed E(spl) bHLH gene during embryogenesis (Wech et al. 1999) and is thus most likely regulated by a variety of factors in addition to NICD and Su(H). Conserved sites upstream of HLHm3 may predict some of the factors that are involved in this regulation. HLHm3 is maternally provided and expressed in the developing embryonic gut, the embryonic midline, the brain, and the terminal sensory organ derivation sites (Wech et al. 1999). Accordingly, there are conserved binding sites for Caudal and Hunchback, two homeodomain proteins that are maternally expressed and play a role in early A–P axis formation (reviewed in Rivera-Pomar and Jackle 1996). Caudal also functions during gut development (Wu and Lengyel 1998), making it a possible regulator of HLHm3 during multiple stages of development. Conserved binding sites for two additional homeodomain-containing proteins, Paired and Fushi tarazu (Ftz), which function during later embryonic development, were also identified upstream of m3. Although best known for its role during segmetation, ftz is expressed in a subset of neuronal precursors in the developing CNS (Doe et al. 1988) and during hindgut formation (Krause et al. 1988). Thus, Ftz is another candidate regulator of HLHm3 during neurogenesis and gut development. Although this study focused on identification of conserved sequences, further studies of E(spl) expression and divergent sequences in multiple Drosophila species will help to determine whether there are in fact functional differences of these genes between species. Since minimal expression and functional data are available for species other than D. melanogster, it is unknown whether changes of expression and function of the E(spl) genes occurred in Drosophila during evolution. If the expression patterns are indeed similar, then the conserved sites may reveal more factors that are responsible for regulating these genes. Alternatively, if expression varies in different species then nonconserved sequences such as those sites identified here may hold important insights into which factors are involved. In either case the analysis we have done in multiple and distantly related species has the potential to provide important answers to the regulation of the E(spl) genes. It will be interesting to determine the effects of altered binding sites and overexpression of relative transcription factors on E(spl) expression to confirm their proposed functions. In addition to these promoter sites, there are both transcription factor binding sites and micro-RNA sites in the 3′-UTR of these genes (Stark et al. 2003; Lai et al. 2005). Further analysis of the conservation of sequences in these regions may reveal additional sites that allow for the overlapping, but distinct, expression patterns of the E(spl) genes. Acknowledgments We thank Thomas Brody for sharing unpublished data and Efrain (Beto) Zuniga for technical assistance. We are grateful to Thomas Brody, Martha Grossel, and Michael Weir for comments on the manuscript. This work was supported by National Institutes of Health grant no. R15 GM067742-01 to D.E. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||
Nature. 2003 Jul 10; 424(6945):147-51.
[Nature. 2003]Genome Res. 2002 Jul; 12(7):1019-28.
[Genome Res. 2002]Proc Natl Acad Sci U S A. 1992 Sep 15; 89(18):8731-5.
[Proc Natl Acad Sci U S A. 1992]Genetics. 1992 Oct; 132(2):505-18.
[Genetics. 1992]Cell. 1994 Dec 2; 79(5):805-15.
[Cell. 1994]Development. 2000 Aug; 127(16):3441-55.
[Development. 2000]Development. 1996 Sep; 122(9):2719-28.
[Development. 1996]Development. 1994 Dec; 120(12):3537-48.
[Development. 1994]Genes Dev. 1995 Nov 1; 9(21):2609-22.
[Genes Dev. 1995]Mol Cell Biol. 1997 Sep; 17(9):5620-8.
[Mol Cell Biol. 1997]Development. 2000 Aug; 127(16):3441-55.
[Development. 2000]Science. 1999 Apr 30; 284(5415):770-6.
[Science. 1999]Dev Biol. 2000 May 15; 221(2):390-403.
[Dev Biol. 2000]Curr Biol. 2005 Jan 26; 15(2):94-104.
[Curr Biol. 2005]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Genome Biol. 2002; 3(12):RESEARCH0086.
[Genome Biol. 2002]Nucleic Acids Res. 2007 Jan; 35(Database issue):D486-91.
[Nucleic Acids Res. 2007]Genome Res. 2002 Apr; 12(4):656-64.
[Genome Res. 2002]Mech Dev. 1999 Feb; 80(2):171-80.
[Mech Dev. 1999]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Genes Dev. 1994 Sep 1; 8(17):2058-71.
[Genes Dev. 1994]EMBO J. 1989 Jan; 8(1):203-10.
[EMBO J. 1989]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D447-53.
[Nucleic Acids Res. 2005]Proc Natl Acad Sci U S A. 2005 Oct 11; 102(41):14700-5.
[Proc Natl Acad Sci U S A. 2005]Genome Biol. 2007; 8(5):R75.
[Genome Biol. 2007]Genome Res. 2002 Apr; 12(4):656-64.
[Genome Res. 2002]Proc Natl Acad Sci U S A. 2005 Oct 11; 102(41):14700-5.
[Proc Natl Acad Sci U S A. 2005]Proc Natl Acad Sci U S A. 2005 Oct 11; 102(41):14700-5.
[Proc Natl Acad Sci U S A. 2005]Genes Dev. 1995 Nov 1; 9(21):2609-22.
[Genes Dev. 1995]Mol Cell Biol. 1997 Sep; 17(9):5620-8.
[Mol Cell Biol. 1997]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Curr Biol. 2005 Jan 26; 15(2):94-104.
[Curr Biol. 2005]Genes Dev. 1995 Nov 1; 9(21):2609-22.
[Genes Dev. 1995]Mol Cell Biol. 1997 Sep; 17(9):5620-8.
[Mol Cell Biol. 1997]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Curr Biol. 2005 Jan 26; 15(2):94-104.
[Curr Biol. 2005]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Curr Biol. 2005 Jan 26; 15(2):94-104.
[Curr Biol. 2005]Mech Dev. 1999 Feb; 80(2):171-80.
[Mech Dev. 1999]Development. 2000 Aug; 127(16):3441-55.
[Development. 2000]Development. 1996 Jan; 122(1):161-71.
[Development. 1996]Dev Biol. 2000 May 15; 221(2):390-403.
[Dev Biol. 2000]Curr Biol. 2001 Mar 6; 11(5):330-8.
[Curr Biol. 2001]Genes Dev. 1994 Sep 1; 8(17):2058-71.
[Genes Dev. 1994]Mol Cell Biol. 2004 Nov; 24(21):9517-26.
[Mol Cell Biol. 2004]Genes Dev. 1994 Sep 1; 8(17):2058-71.
[Genes Dev. 1994]Mol Cell Biol. 2004 Nov; 24(21):9517-26.
[Mol Cell Biol. 2004]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Curr Biol. 2005 Jan 26; 15(2):94-104.
[Curr Biol. 2005]Mol Cell Biol. 1999 Jul; 19(7):4600-10.
[Mol Cell Biol. 1999]Mol Gen Genet. 1994 Sep 1; 244(5):465-73.
[Mol Gen Genet. 1994]Proc Natl Acad Sci U S A. 1992 Jan 15; 89(2):599-602.
[Proc Natl Acad Sci U S A. 1992]Dev Biol. 2005 May 15; 281(2):299-308.
[Dev Biol. 2005]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Curr Biol. 2005 Jan 26; 15(2):94-104.
[Curr Biol. 2005]BMC Evol Biol. 2005 Nov 17; 5():67.
[BMC Evol Biol. 2005]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Development. 1999 May; 126(10):2205-14.
[Development. 1999]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Dev Genes Evol. 1999 Jun; 209(6):370-5.
[Dev Genes Evol. 1999]Dev Biol. 2000 May 15; 221(2):390-403.
[Dev Biol. 2000]Bioinformatics. 2005 Jul 1; 21(13):2933-42.
[Bioinformatics. 2005]Genome Biol. 2007; 8(5):R75.
[Genome Biol. 2007]Dev Genes Evol. 1999 Jun; 209(6):370-5.
[Dev Genes Evol. 1999]Genome Biol. 2007; 8(5):R75.
[Genome Biol. 2007]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Curr Biol. 2005 Jan 26; 15(2):94-104.
[Curr Biol. 2005]Curr Biol. 2005 Jan 26; 15(2):94-104.
[Curr Biol. 2005]Dev Cell. 2006 Dec; 11(6):895-902.
[Dev Cell. 2006]Development. 2002 Apr; 129(7):1785-93.
[Development. 2002]Dev Cell. 2006 Dec; 11(6):895-902.
[Dev Cell. 2006]Genetics. 1992 Oct; 132(2):505-18.
[Genetics. 1992]Genes Dev. 1998 Jul 1; 12(13):2036-47.
[Genes Dev. 1998]J Biol Chem. 2005 Jan 14; 280(2):1299-305.
[J Biol Chem. 2005]Proc Natl Acad Sci U S A. 1993 Jun 15; 90(12):5464-8.
[Proc Natl Acad Sci U S A. 1993]BMC Evol Biol. 2005 Nov 17; 5():67.
[BMC Evol Biol. 2005]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Genetica. 2003 Jul; 118(2-3):279-94.
[Genetica. 2003]Genetics. 1999 Apr; 151(4):1531-45.
[Genetics. 1999]BMC Evol Biol. 2005 Nov 17; 5():67.
[BMC Evol Biol. 2005]Genes Dev. 1993 Sep; 7(9):1703-13.
[Genes Dev. 1993]Dev Genes Evol. 1999 Jun; 209(6):370-5.
[Dev Genes Evol. 1999]Development. 2002 Apr; 129(7):1785-93.
[Development. 2002]Development. 1996 Sep; 122(9):2719-28.
[Development. 1996]Dev Biol. 1999 Sep 1; 213(1):33-53.
[Dev Biol. 1999]Dev Genes Evol. 1999 Jun; 209(6):370-5.
[Dev Genes Evol. 1999]Trends Genet. 1996 Nov; 12(11):478-83.
[Trends Genet. 1996]Development. 1998 Jul; 125(13):2433-42.
[Development. 1998]Science. 1988 Jan 8; 239(4836):170-5.
[Science. 1988]Genes Dev. 1988 Aug; 2(8):1021-36.
[Genes Dev. 1988]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Genes Dev. 2005 May 1; 19(9):1067-80.
[Genes Dev. 2005]