![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2008 The Author(s) A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system 1Program in Gene Function and Expression, 2Department of Biochemistry and Molecular Pharmacology, 3Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605 and 4Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA Corresponding author.*To whom correspondence should be addressed. Phone: +1 508 856 3953, Fax: +1 508 856 5460, Email: scot.wolfe/at/umassmed.edu Correspondence may also be addressed to Michael H. Brodsky. Phone: +1 508 856 1640; Fax +1 508 856 5460, Email: michael.brodsky/at/umassmed.edu Received December 21, 2007; Revised January 22, 2008; Accepted January 24, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Specificity data for groups of transcription factors (TFs) in a common regulatory network can be used to computationally identify the location of cis-regulatory modules in a genome. The primary limitation for this type of analysis is the paucity of specificity data that is available for the majority of TFs. We describe an omega-based bacterial one-hybrid system that provides a rapid method for characterizing DNA-binding specificities on a genome-wide scale. Using this system, 35 members of the Drosophila melanogaster segmentation network have been characterized, including representative members of all of the major classes of DNA-binding domains. A suite of web-based tools was created that uses this binding site dataset and phylogenetic comparisons to identify cis-regulatory modules throughout the fly genome. These tools allow specificities for any combination of factors to be used to perform rapid local or genome-wide searches for cis-regulatory modules. The utility of these factor specificities and tools is demonstrated on the well-characterized segmentation network. By incorporating specificity data on an additional 66 factors that we have characterized, our tools utilize ~14% of the predicted factors within the fly genome and provide an important new community resource for the identification of cis-regulatory modules. INTRODUCTION The identification of cis-regulatory sequences throughout the genome and the complementary sequence-specific trans-acting factors that bind within these modules is an important step in deciphering the mechanism of spatial and temporal gene regulation in metazoans. The majority of sequence-specific transcription factors (TFs) in a eukaryotic genome can be readily identified by sequence homology to previously identified families of DNA-binding domains, where complex organisms usually contain a higher proportion of TFs (~5–10%) due to the requirement for more elaborate transcriptional regulatory networks (1). However, identifying cis-regulatory modules (CRMs) within a genome is difficult due to the more dynamic nature of these sequences relative to coding sequences (2) and the fact that the vast majority of DNA in higher eukaryotes is noncoding sequence (3). Biochemical and computational methods for the identification of CRMs have been developed, yet limitations remain. Biochemical methods based on ChIP–chip (4–6), nuclease hypersensitive sites (7,8) and 5C (9,10) allow the identification of functional elements throughout the genome. However, these techniques are limited typically to cell types that can be obtained in sufficient quantities for each protocol. In addition, identification of genomic binding sites by ChIP does not reveal whether those sites are functional; binding sites that are occupied in vivo may not contribute to organismal fitness, as long as they do not have negative consequences (6,11). CRMs can be computationally identified by searching for overrepresented clusters of binding sites within the genome for groups of TFs that function in a common transcriptional regulatory network (12–16). The accuracy of these predictions can be improved by incorporating phylogenetic comparisons between species separated by moderate evolutionary distances (17,18). In combination with ChIP experiments, computational analysis of evolutionary conservation provides an approach to identify functional TF-binding sites (19). The prediction of CRMs and their cognate factors via binding site cluster analysis has been most thoroughly studied in the context of the regulatory cascade driving anterior–posterior (A–P) pattern formation during embryogenesis in Drosophila melanogaster. A hierarchy of genes responsible for the systematic subdivision of the embryo into 14 segments has been defined through exhaustive genetic studies (20–22). These genes are expressed in four sequential steps—maternal, gap, pair-rule and segment polarity—with genes in each tier of the hierarchy cooperating with the previous group of factors to coordinate expression of the next set of genes (Figure 1
The small proportion of TFs with well-characterized DNA-binding specificities is not limited to D. melanogaster. This incomplete state of knowledge is representative of the majority of eukaryotic genomes and reflects the absence of high-throughput studies of factor specificities. In vitro methods for characterizing specificity include DNaseI footprinting (27), SELEX (28–31) and protein-binding microarrays (32–35). To date, these methods have not been widely adopted for large-scale analysis of TF specificities. TF specificities can also be identified as overrepresented motifs within DNA sequences identified in genome-wide TF ChIP datasets (4–6,36). When applied to the comparatively simple yeast genome, this approach successfully identified high confidence motifs for 65 of 203 (32%) of its TFs (4). The inability to determine specificities for the majority of these factors may reflect the difficulty in identifying motifs within the larger sequence segments defined by ChIP experiments and the complications associated with TFs that bind DNA in complexes with one or more other TFs.We have previously described a bacterial one-hybrid (B1H) system for the rapid characterization of TFs (37,38). This technology has certain attributes that make it suitable as a platform for the genome-wide analysis of DNA-binding domain specificities. Selections are performed in vivo, which precludes the need to purify any given factor. Moreover, binding sites are isolated based on their ability to activate a biological response in the context of competition from a pool of potential sites in the Escherichia coli genome, which simulates the functional requirements in a eukaryotic genome. Binding sites for a factor are isolated in a single round of selection using standard molecular biology and sequencing technologies making it accessible to most laboratories. Here, we describe substantial improvements to the B1H system that increase its sensitivity and dynamic range, and make it amenable for the high-throughput analysis of sequence-specific TFs (Figure 2
We have supplemented our specificity database with the specificities of an additional 66 factors that were also characterized using our B1H selection system (Noyes, et al. manuscript in preparation). The combination of a large database of factor specificities coupled with web-based tools for the rapid analysis of any combination of TFs provides the community with a readily accessible tool to discover CRMs genome-wide. The combination of computational analysis based on conservation of binding sites for individual factors and experimental techniques for identifying sites in a single organism (e.g. ChIP–chip) should allow a comprehensive annotation of the CRMs throughout the genome and the TFs that function through these elements. MATERIALS AND METHODS Omega-based binding site selection system The omega-based binding site selection system (Figure 2) was derived from an alpha-based B1H selection system (37,38). A detailed description of the construction of the ΔrpoZ selection strain, the omega-fusion expression vectors, the 28-bp and ZF10 randomized libraries and the binding site selection procedure is presented in the Supplementary Methods. Factor information The amino acid sequence for each factor used and all of the sequences of the binding sites recovered in the individual selections are provided in Supplementary Table 1 with the exception of the majority of the homeodomain sequences and selected binding sites, which will be described seperately (Noyes, et al. manuscript in preparation). Sequence logos (40) for each factor were created by WebLogo (41) using the aligned motifs defined by MEME (42) identified within the B1H-selected sequences. PWMs representing the specificities of these factors are listed in Supplementary Table 2. Omega-fusion activity assays The constructs used in the omega-Zif268, omega-Gt, omega-Prd and omega-Hb activity assays as well as the assay conditions are described in the Supplementary Methods Motifcount analysis First, the ‘expression profile’ of a TF is determined from available data on the in situ hybridization of the TF's mRNA (17,43), which is a real-valued measurement of the TF's expression level in each of 100 equally spaced intervals (‘bins’) along the A–P axis of the Stage 4–6 (blastoderm) embryo. Then calculate the ‘discrete expression profile’ for a set of 48 CRMs that drive A–P gene expression in a defined pattern in the blastoderm embryo (17): for each CRM, determine whether it drives gene expression in each of the 100 bins along the A–P axis by imposing a fixed threshold on the real-valued expression levels. For each CRM, ‘count’ the number of binding sites for the TF, using its PWM and Stubb (44) as described in ref. (45). Then for each of the 100 bins along the A–P axis, collect the set of CRMs that are ‘expressed’ in that bin, and compute the average of the binding site counts for these CRMs. This average is the TF's ‘MOTIFCOUNT’ for that bin, which is plotted along with the TF's expression profile for each bin along the A–P axis. P-values for this analysis were computed as follows:
Gbrowser-based web tool Single motif tracks For each PWM, scan the genome with a sliding window of 500 bp shifted in 50-bp increments, and count the number of occurrences of the PWM in each window, using the Stubb program (44) to generate the ‘DICT’ score. The resulting profile of DICT scores is then plotted as a ‘track’ in GBrowse (39). These tracks are shown for each PWM in D. melanogaster and D. pseudoobscura in genomic coordinates of the former. A ‘two-species’ track is also plotted, combining the DICT scores of homologous windows from the two genomes. For this, each species’ DICT score is first converted to a ‘z-score’, by subtracting the genome-wide mean and then dividing by the genome-wide standard deviation, and the z-scores of the homologous windows are averaged. For D. melanogaster windows in which the syntenic region could not be properly defined using the ‘liftover’ tool (genome.ucsc.edu), the D. melanogaster z-score is halved to obtain the two-species track. Motif combination tracks Any combination of two or more PWMs can be used to create a ‘motif combination track’ that is dynamically plotted as follows: For each 500-bp window, the z-score of each PWM's DICT score is computed as above, set to zero if it is negative, and an average over the chosen combination of PWMs is regarded as the score of this window. The resulting score profile is plotted as a track. Such tracks may be created for each of the two genomes separately. A ‘two-species’ motif combination track may also be created by averaging the scores from homologous windows. The mean and standard deviation of a combination track is computed from 1 Mbp sequence on either side of the region currently displayed by the browser. The Genome-wide search tool is described in the Supplementary Methods. RESULTS Development of the omega-based B1H system Our original B1H system for characterizing DNA-binding specificity utilized TF fusions to the alpha-subunit of RNA polymerase (alpha–TF) (37,38). This system contains three components: the alpha–TF expression vector, a tandem HIS3-URA3 reporter cassette in a low copy number plasmid (pH3U3) and the selection strain with the bacterial homologs of the reporter genes inactivated (ΔhisB, ΔpyrF). The HIS3-URA3 reporter cassette is regulated by a weak promoter and consequently these genes, which provide a direct method for auxotrophic selection, are only weakly transcribed. However, when a functional binding site for the alpha-linked TF is present upstream of the weak promoter, RNA polymerase can be actively recruited to stimulate transcription of the reporter cassette (46). Thus, bacteria harboring a complementary interaction between the TF and reporter DNA can be selected under appropriate growth conditions, allowing binding sites complementary to a TF to be isolated from a randomized library introduced into the reporter vector. Our alpha-based system, while suitable for characterizing factors such as Cys2His2 zinc finger proteins, proved ineffective with several additional factors, including basic helix–loop–helix proteins (bHLH) and homeodomains (data not shown). The origin of this limitation was unclear, but one potential source was insufficient sensitivity: alpha is an essential gene, and as such, alpha–TF fusions are in competition with endogenous alpha for incorporation into RNA polymerase complexes. Omega is the only conserved component of bacterial RNA polymerase (α2ββ′ω) that is not required for viability under laboratory growth conditions (47). Hochschild and Dove (48) demonstrated that artificial interactions between a sequence-specific TF and the omega-subunit of RNA polymerase, like interactions with the alpha-subunit, could mediate activation of a nearby promoter. Because Omega is not required for viability, Omega-fusions have the potential advantage that selections might be performed in an omega-knockout (ΔrpoZ) strain, where omega-fusions could be uniformly incorporated into RNA polymerase without competition. Under these conditions, the selection system should be more sensitive due to the higher cellular concentration of RNAP–TF complexes, allowing weaker protein–DNA interactions to be characterized. To test this hypothesis we knocked-out the rpoZ gene in our selection strain (Supplementary Figure 1) and examined the activity of an omega–Zif268 fusion with a reporter vector containing a Zif268-binding site. The fusion was expressed using three promoter strengths: a strong dual promoter (lppC-lacUV5) used for alpha-based selections, a lacUV5 promoter and a mutant lacUV5 promoter (lacUV5m) (Supplementary Figure 2). Omega–Zif268 expressed via the weakest (lacUV5m) promoter displayed robust activity, allowing cells to survive at higher 3-AT concentrations than was tolerated by the alpha–Zif268 fusion under optimal expression conditions (data not shown). Surprisingly, omega–Zif268 constructs expressed with either the dual promoter or the lacUV5 promoter proved toxic. However, for other factors (Paired, Hunchback and Giant) higher expression levels obtained using the stronger promoters were required to fully activate the reporter system (Supplementary Figure 3). The difference in promoter strengths used to drive expression of each factor was reflected in the relative protein expression levels of each factor within the cell (Supplementary Figure 4). Thus, the availability of three different promoter strengths provides flexibility to characterize a wide variety of TFs that may differ in affinity, specificity and expression level. The omega-based B1H system is sensitive to changes in the strength of the interaction between a DNA-binding domain and its target site. The activity of omega–Zif268 with its consensus sequence was compared to three different variants of the binding site that have 4- to 20-fold reduced affinity (49). A clear correlation is observed between colony size and number with the quality of the binding site: cells containing the consensus sequence within the reporter displayed the highest rates of survival and the largest colonies relative to the survival rates and colony sizes for other sites with decreased affinity (Supplementary Figure 5). Based on these results we expect that the distribution of sequences that are recovered from a binding site selection will be a function of the difference in affinity of the protein for these sites. As a result the recognition motif constructed from the selected sites should accurately reflect the specificity of the factor. The optimal position of the Zif268-binding site was determined by examining the activity of reporters harboring sites positioned in various registers relative to the promoter (Supplementary Figure 6). Based on this analysis, a new 28-bp randomized binding site library was constructed that contains ~2 × 108 unique clones, which should encode the majority of possible 12-bp sites in each frame of the binding site window. The utility of the 28-bp library in the omega-B1H system was assessed by determining the DNA-binding specificity of three well-characterized DNA-binding domains: Zif268, Mig1 and Rap1. The recognition motif for each factor generated from the selected sequences matches well with previously described specificities for these factors (Supplementary Figure 7). Thus, the omega-based B1H system and the new 28-bp binding site library can be used to rapidly determine the DNA-binding specificity of a TF. However, homeodomains did not yield a recognition motif when characterized in the standard omega-based B1H system (data not shown). Consequently a modified version of the selection system was created for domains that are limited by either weak specificity or affinity (Figure 2 Large-scale analysis of D. melanogaster TFs To demonstrate that this technology is sufficiently rapid and simple to perform a comprehensive characterization of the TFs, we focused on characterizing the majority of the factors in the early A–P patterning network in D. melanogaster. This network contains representative members of a wide variety of DNA-binding domain families that are present in higher eukaryotes (17). Included within this set of factors are members of the five most highly represented DNA-binding domain families (51): Cys2His2 zinc fingers, homeodomains, bHLH, bZIP and winged helix as well as other less well-represented domains (Figure 1 We characterized the specificity of 35 different factors involved in the A–P pathway, which represent nine different DNA-binding domain families (Figure 3
A number of factors in the dataset lacked quality recognition motifs. Some of these factors, such as Caudal (Cad), Gt and Kni, were originally identified and described ~20 years ago, and play critical early roles in segmentation (25), yet have poorly defined specificity (Figure 4 Assessing the predictive value of the B1H-generated motifs As an initial assessment of the utility of our binding site motifs for identifying CRMs, we examined the correlation between the expression profile of each TF and the occurrence of its binding sites in 48 CRMs from D. melanogaster that drive patterned gene expression in the early embryo, using a previously described method (17). When a TF functions as an activator, one would expect an overrepresentation of its binding sites in CRMs that drive gene expression in the same spatial and temporal domains. Conversely, when a TF functions as a repressor that defines a spatial boundary for the expression of a CRM, there should be an anticorrelation between the expression profile of the TF and of CRMs that contain its binding sites. We focused on a set of eight TFs that play prominent roles in early patterning for which we could compare our characterized recognition motifs (‘B1H’) with existing motifs previously utilized for CRM discovery (‘DnaseI’) (17). We used Stubb (44) to calculate, for each CRM, a score that describes the number of binding sites for any given TF and their quality based on its PWM. The A–P axis of the embryo was divided into 100 different regions and for each such region, the average of the scores (of a TF) over all of the CRMs contributing to gene expression in that region was calculated. This average score, called Motifcount, was then compared with the expression profile of each TF (Figure 5
One additional feature of these plots is of particular interest. For some of the repressors, e.g. Gt, Hb and Hkb, there is a strong underrepresentation of binding sites in CRMs that have overlapping expression profiles. Selective pressure against the presence of these binding sites may play an important role in shaping the sequence composition of the CRM just as there is selective pressure to maintain binding sites for factors that participate in gene regulation (56). Overall, these results suggest that our B1H-generated PWMs have favorable properties for the prediction of CRMs and are superior to the previously employed PWMs for CRM discovery (17,26). A Motifcount analysis on syntenic regions to these CRMs within the D. pseudoobscura and D. mojanvensis genomes generates similar plots indicating that our PWMs should have utility for the prediction of CRMs within related species (Supplementary Figure 8). Genome Surveyor: a new tool for identifying CRMs We developed a new genome analysis tool, Genome Surveyor, to rapidly search for putative CRMs based on the presence of overrepresented binding sites for a combination of TFs. A simple scoring function was chosen based on its ability to readily identify known CRMs amongst a large population of random intergenic sequences (Supplementary Table 3): putative CRMs are identified by computing the average of the overrepresentation score (z-score) for a group of TFs over 500-bp windows tiled across the genome. Using our PWMs, this scoring function distinguishes CRMs in our test set with an accuracy that is similar to that of Stubb (44). Importantly, this scoring function provides an enormous advantage in speed over Stubb, as the z-scores for each factor can be calculated once across the genome and this stored information may then be used in all combination searches that include a particular TF. Our method differs from that of ecis-analyst (26) in that we value each site according to its PWM score, which allows both strong and weak sites to contribute to the overall score for each 500-bp window. The significance of the overall score in each window for each TF is determined by calculating a z-score, which reflects how the score in that window compares to the overall genomic distribution. In contrast, ecis-analyst employs a user-defined threshold (P-value) to determine if a site will be scored as present, and if defined as present, all sites contribute equally to the score. We developed a flexible user interface that operates through the GBrowse software package (39) to allow a user to utilize our scoring function and library of PWMs to search for CRMs in the D. melanogaster genome (Figure 6
We also created a Genome Search Tool within Genome Surveyor that allows a user to perform genome-wide searches for the highest scoring windows using any combination of factors. This page can be accessed via a link in the Gbrowse webpage wherein users can select the combination of factors that they want to employ in their search, the number of top hits that they want returned, and the option to search in the D. melanogaster genome alone, or in combination with the D. pseudoobscura genome. To avoid recovering peaks that are primarily the result of a strong peak for a single factor, an additional filter can be enabled that requires the combination peak score to be composed of a certain number of factors with individual scores above a desired significance threshold. Each search returns a table of positions within the D. melanogaster genome with the highest average z-scores listed in descending order (Table 1). The z-scores for each hit are listed in the D. melanogaster and D. pseudoobscura genomes as well as the combination score across both genomes. The output also includes a list of factors that are contributing significantly to the score within each region, as well as the nearest neighboring genes and their distances from the center of the binding site cluster. The location of each hit is linked back to the Gbrowse tool to enable visualization of the surrounding genomic region for more detailed inspection of the contributing factors.
The effectiveness of these tools and database is evident in the top hits that are returned from a combined D. melanogaster and D. pseudoobscura genome search using TFs that are involved in anterior patterning (Bcd, Hb, Hkb, Kr and Tll; Table 1). This search produces a remarkable number of strong hits that neighbor genes with early anterior expression patterns: 13 of the 15 top hits are in genes that display early anterior expression and 8 of these 13 are in previously annotated CRMs. The top hit from this search falls within ‘eve’ stripe 1 (Figure 6 DISCUSSION We have developed an omega-based B1H system that allows the high-throughput determination of TF DNA-binding specificities. This system has several advantages over other techniques for characterizing DNA-binding specificity. First, the use of E. coli as our platform allows the isolation of TF-binding sites in vivo with a single round of selection without protein purification. Because of the extremely high transformation efficiency of E. coli, randomized binding site libraries with complexity greater than 108 members can be utilized. With omega–TF hybrids, the absence of competition from endogenous omega provides a more sensitive selection system with a much greater dynamic range than previous systems (37,57). This sensitivity has allowed us to successfully characterize TFs that failed to generate motifs in the alpha-based B1H system and make it feasible to consider a genome-wide analysis of TF specificities. Using this system we have determined recognition motifs for 35 factors in the fly segmentation network. In addition, we have characterized the specificity of another 66 factors not directly associated with this network (Noyes et al. manuscript in preparation), which have been incorporated into our database. Together, these specificities represent ~14% of the predicted D. melanogaster TFs (52). For comparison the FlyREG database contains motifs for 53 TFs constructed from five or more identified binding sites (27); thus our database nearly doubles the number of specificities that are available, and in cases where these databases overlap, our data is typically of higher quality. Our data is not a perfect representation of each factor's specificity. For example, using our Knirps motif a strong region of binding site overrepresentation in ‘eve’ stripe 3 + 7 is identified, but only a weak peak is present in ‘eve’ stripe 4 + 6 (Figure 6 The rate of successful TF characterization within the B1H system makes it amenable to perform comprehensive surveys of TF specificity in complex organisms: once cloned, 10 or more factors can be analyzed in parallel in the B1H system in a manner of days. Our current dataset is focused primarily on monomeric DNA-binding domains, but also includes examples of homodimers and heterodimers. This reductionist approach does not address the potential for sets of factors to cooperatively recognize motifs that are not a simple composite formed from their individual motifs, such as the Exd–Hox combinations (59–61). In cases where this may be a concern, pairs of factors can be characterized in the B1H system using expression vectors developed for evaluating the specificity of heterodimers (37,38). The Genome Surveyor tool provides a fast, flexible and accessible platform to use the PWMs generated from our B1H data to identify CRMs in the fly genome. Other groups have previously used the D. melanogaster maternal and gap TFs to demonstrate that known and novel CRMs could be successfully identified within the genome based on the presence of clusters of binding sites for factors that function in a common regulatory pathway (12, 14,17,26). These studies demonstrated that even relatively crude representations of the DNA-binding specificity of a TF, typically constructed from DNaseI footprinting on a limited number of sites (52), could help identify CRMs and that these predictions could be improved by using two related fly genomes (18,26). These computational approaches, as well as an additional method (16) share the common overall strategy with Genome Surveyor of identifying clusters of overrepresented binding sites. A key distinguishing feature of Genome Surveyor is that it precalculates the quality of each binding site within each window to generate an overall score, which is evaluated relative to the genome average to provide a measure of its significance. The scores of any combination of factors can then be combined with sufficient speed to allow genome-wide searches to be performed on a webserver. Thus, Genome Surveyor, which is integrated within the GBrowse software interface, provides a particularly powerful platform for gene-specific or genome-wide searches for CRMs regulated by a user-defined combination of factors. Genome-wide searches can be performed with any combination of 101 factors over the D. melanogaster and D. pseudoobscura genomes and individual peaks of interest within the genome can then be examined using the GBrowse tools. Peaks that overlap with previously identified CRMs can be easily identified by uploading annotations for these elements from the REDfly website (redfly.ccr.buffalo.edu) (62). The number and quality of PWMs available for these searches will increase with the adoption of new, high-depth sequencing such as 454 (63,64) and SOLEXA-based sequencing (65,66) for the analysis of the B1H-selected binding sites. As the number of factors with high-quality PWMs increases, it should be feasible to annotate most potential CRMs using combinations of factors that function in common regulatory networks. Cooperating TFs could be identified based on common expression patterns, phenotypes or physical interactions. Because Genome Surveyor is built into the GBrowse webtool format (39), it will also be possible to incorporate other corroborating datasets into these tools, such as genome-wide ChIP analysis of TF binding or chromatin structure. The combination of these experimental and computation approaches for the identification of CRMs should provide the most robust method for the functional annotation of these elements throughout eukaryotic genomes. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. [Supplementary Data]
ACKNOWLEDGEMENTS We would like to thank the Berkeley Drosophila Genome Project (BDGP) for producing the cDNA clones used in this study, the Drosophila Genomics Resource Center (DGRC) for distributing the clones, and Mark Stapleton and Susan Celniker for sharing unpublished data. Some of these ORFs were obtained from clones produced by BDGP under National Institutes of Health grant (HG002673 to S. E. Celniker). We would like to thank Robin Smith for technical support. S.A.W. M.B.N. and X.M. were supported by National Institutes of Health grants (GM068110 and HG003721 to S.A.W.), A.W. was supported in part by National Institutes of Health grant (HG003721 to S.A.W.). M.H.B. and A.W. were supported in part by a New Scholar in Aging Award from the Ellison Medical Foundation and American Cancer Society grant (RSG-05-026-01-CCG) to M.H.B. Funding to pay the Open Access publication charges for this article was provided by GM068110. Conflict of interest statement. None declared. REFERENCES 1. Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003;424:147–151. [PubMed] 2. Ludwig MZ, Patel NH, Kreitman M. Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. Development. 1998;125:949–958. [PubMed] 3. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034–1050. [PubMed] 4. Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, et al. Transcriptional regulatory code of a eukaryotic genome. Nature. 2004;431:99–104. [PubMed] 5. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002;298:799–804. [PubMed] 6. Zeitlinger J, Zinzen RP, Stark A, Kellis M, Zhang H, Young RA, Levine M. Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. Genes Dev. 2007;21:385–390. [PubMed] 7. Crawford GE, Holt IE, Mullikin JC, Tai D, National Institutes of Health Intramural Sequencing C, Blakesley R, Bouffard G, Young A, Masiello C, et al. Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc. Natl. Acad. Sci. 2004;101:992–997. [PubMed] 8. Sabo PJ, Humbert R, Hawrylycz M, Wallace JC, Dorschner MO, McArthur M, Stamatoyannopoulos JA. Genome-wide identification of DNaseI hypersensitive sites using active chromatin sequence libraries. Proc. Natl Acad. Sci. 2004;101:4537–4542. [PubMed] 9. Dostie J, Richmond TA, Arnaout RA, Selzer RR, Lee WL, Honan TA, Rubio ED, Krumm A, Lamb J, Nusbaum C, et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 2006;16:1299–1309. [PubMed] 10. Dostie J, Dekker J. Mapping networks of physical interactions between genomic elements using 5C technology. Nat. Protocol. 2007;2:988–1002. 11. Gao F, Foat B, Bussemaker H. Defining transcriptional networks through integrative modeling of mRNA expression and transcription factor binding data. BMC Bioinformatics. 2004;5:31. [PubMed] 12. Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM, Eisen MB. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl Acad. Sci. USA. 2002;99:757–762. [PubMed] 13. Markstein M, Markstein P, Markstein V, Levine MS. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl Acad. Sci. USA. 2002;99:763–768. [PubMed] 14. Rajewsky N, Vergassola M, Gaul U, Siggia ED. Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics. 2002;3:30. [PubMed] 15. Lifanov AP, Makeev VJ, Nazina AG, Papatsenko DA. Homotypic regulatory clusters in Drosophila. Genome Res. 2003;13:579–588. [PubMed] 16. Sosinsky A, Bonin CP, Mann RS, Honig B. Target Explorer: An automated tool for the identification of new target genes for a specified set of transcription factors. Nucleic Acids Res. 2003;31:3589–3592. [PubMed] 17. Schroeder MD, Pearce M, Fak J, Fan H, Unnerstall U, Emberly E, Rajewsky N, Siggia ED, Gaul U. Transcriptional control in the segmentation gene network of Drosophila. PLoS Biol. 2004;2:E271. [PubMed] 18. Sinha S, Schroeder MD, Unnerstall U, Gaul U, Siggia ED. Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila. BMC Bioinformatics. 2004;5:129. [PubMed] 19. Kheradpour P, Stark A, Roy S, Kellis M. Reliable prediction of regulator targets using 12 Drosophila genomes. Genome research. 2007;17:1919–1931. [PubMed] 20. Jaeger J, Reinitz J. On the dynamic nature of positional information. BioEssays. 2006;28:1102–1111. [PubMed] 21. Peel AD, Chipman AD, Akam M. Arthropod segmentation: beyond the Drosophila paradigm. Nat. Rev. Genet. 2005;6:905–916. [PubMed] 22. Pick L. Segmentation: painting stripes from flies to vertebrates. Dev. Genet. 1998;23:1–10. [PubMed] 23. Arnosti DN, Barolo S, Levine M, Small S. The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development. 1996;122:205–214. [PubMed] 24. Arnosti DN. Analysis and function of transcriptional regulatory elements: insights from Drosophila. Annu. Rev. Entomol. 2003;48:579–602. [PubMed] 25. St Johnston D, Nusslein-Volhard. The origin of pattern and polarity in the Drosophila embryo. Cell. 1992;68:201–219. [PubMed] 26. Berman BP, Pfeiffer BD, Laverty TR, Salzberg SL, Rubin GM, Eisen MB, Celniker SE. Computational identification of developmental enhancers: conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. 2004;5:R61. [PubMed] 27. Bergman CM, Carlson JW, Celniker SE. Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics. 2005;21:1747–1749. [PubMed] 28. Ellington AD, Szostak JW. In vitro selection of RNA molecules that bind specific ligands. Nature. 1990;346:818–822. [PubMed] 29. Tuerk C, Gold L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science. 1990;249:505–510. [PubMed] 30. Wright WE, Funk WD. CASTing for multicomponent DNA-binding complexes. Trends Biochem. Sci. 1993;18:77–80. [PubMed] 31. Roulet E, Busso S, Camargo AA, Simpson AJ, Mermod N, Bucher P. High-throughput SELEX-SAGE method for quantitative modeling of transcription-factor binding sites. Nat. Biotechnol. 2002;20:831–835. [PubMed] 32. Bulyk ML, Huang X, Choo Y, Church GM. Exploring the DNA-binding specificities of zinc fingers with DNA microarrays. Proc. Natl Acad. Sci. USA. 2001;98:7158–7163. [PubMed] 33. Linnell J, Mott R, Field S, Kwiatkowski DP, Ragoussis J, Udalova IA. Quantitative high-throughput analysis of transcription factor binding specificities. Nucleic Acids Res. 2004;32:e44. [PubMed] 34. Berger MF, Philippakis AA, Qureshi AM, He FS, Estep PW, Bulyk ML. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotech. 2006;24:1429–1435. 35. Mukherjee S, Berger MF, Jona G, Wang XS, Muzzey D, Snyder M, Young RA, Bulyk ML. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat. Genet. 2004;36:1331–1339. [PubMed] 36. Lieb JD, Liu X, Botstein D, Brown PO. Promoter-specific binding of Rap1 revealed by genome-wide maps of protein–DNA association. Nat. Genet. 2001;28:327–334. [PubMed] 37. Meng X, Brodsky MH, Wolfe SA. A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nat. Biotechnol. 2005;23:988–994. [PubMed] 38. Meng X, Wolfe SA. Identifying DNA sequences recognized by a transcription factor using a bacterial one-hybrid system. Nat. Protocol. 2006;1:30–45. 39. Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, et al. The Generic Genome Browser: a building block for a model organism system database. Genome Res. 2002;12:1599–1610. [PubMed] 40. Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–6100. [PubMed] 41. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. [PubMed] 42. Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1994;2:28–36. [PubMed] 43. Tomancak P, Berman BP, Beaton A, Weiszmann R, Kwan E, Hartenstein V, Celniker SE, Rubin GM. Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol. 2007;8:R145. [PubMed] 44. Sinha S, van Nimwegen E, Siggia ED. A probabilistic method to detect regulatory modules. Bioinformatics. 2003;19(Suppl. 1):i292–i301. [PubMed] 45. Sinha S. On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics. 2006;22:e454–463. [PubMed] 46. Dove SL, Joung JK, Hochschild A. Activation of prokaryotic transcription through arbitrary protein-protein contacts. Nature. 1997;386:627–630. [PubMed] 47. Gentry DR, Burgess RR. rpoZ, encoding the omega subunit of Escherichia coli RNA polymerase, is in the same operon as spoT. J. Bacteriol. 1989;171:1271–1277. [PubMed] 48. Dove SL, Hochschild A. Conversion of the omega subunit of Escherichia coli RNA polymerase into a transcriptional activator or an activation target. Genes Dev. 1998;12:745–754. [PubMed] 49. Miller JC, Pabo CO. Rearrangement of side-chains in a Zif268 mutant highlights the complexities of zinc finger-DNA recognition. J. Mol. Biol. 2001;313:309–315. [PubMed] 50. Pomerantz JL, Sharp PA, Pabo CO. Structure-based design of transcription factors. Science. 1995;267:93–96. [PubMed] 51. Tupler R, Perini G, Green MR. Expressing the human genome. Nature. 2001;409:832–833. [PubMed] 52. Adryan B, Teichmann SA. FlyTF: a systematic review of site-specific transcription factors in the fruit fly Drosophila melanogaster. Bioinformatics. 2006;22:1532–1533. [PubMed] 53. Wilson DS, Sheng G, Jun S, Desplan C. Conservation and diversification in homeodomain-DNA interactions: a comparative genetic analysis. Proc. Natl Acad. Sci. USA. 1996;93:6886–6891. [PubMed] 54. Dearolf CR, Topol J, Parker CS. The caudal gene product is a direct activator of fushi tarazu transcription during Drosophila embryogenesis. Nature. 1989;341:340–343. [PubMed] 55. Margalit Y, Yarus S, Shapira E, Gruenbaum Y, Fainsod A. Isolation and characterization of target sequences of the chicken CdxA homeobox gene. Nucleic Acids Res. 1993;21:4915–4922. [PubMed] 56. Ludwig MZ, Bergman C, Patel NH, Kreitman M. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature. 2000;403:564–567. [PubMed] 57. Durai S, Bosley A, Abulencia AB, Chandrasegaran S, Ostermeier M. A bacterial one-hybrid selection system for interrogating zinc finger-DNA interactions. Comb. Chem. High Throughput Screen. 2006;9:301–311. [PubMed] 58. Clyde DE, Corado MS, Wu X, Pare A, Papatsenko D, Small S. A self-organizing system of repressor gradients establishes segmental complexity in Drosophila. Nature. 2003;426:849–853. [PubMed] 59. Pearson JC, Lemons D, McGinnis W. Modulating Hox gene functions during animal body patterning. Nat. Rev. Genet. 2005;6:893–904. [PubMed] 60. Ryoo HD, Mann RS. The control of trunk Hox specificity and activity by Extradenticle. Genes Dev. 1999;13:1704–1716. [PubMed] 61. Wilson DS, Desplan C. Structural basis of Hox specificity. Nat. Struct. Biol. 1999;6:297–300. [PubMed] 62. Gallo SM, Li L, Hu Z, Halfon MS. REDfly: a Regulatory Element Database for Drosophila. Bioinformatics. 2006;22:381–383. [PubMed] 63. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. [PubMed] 64. Hoffmann C, Minkah N, Leipzig J, Wang G, Arens MQ, Tebas P, Bushman FD. DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutations. Nucleic Acids Res. 2007;35:e91. [PubMed] 65. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein–DNA interactions. Science. 2007;316:1497–1502. [PubMed] 66. Barski A, Cuddapah S, Cui K, Roh T-Y, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Nature. 2003 Jul 10; 424(6945):147-51.
[Nature. 2003]Development. 1998 Mar; 125(5):949-58.
[Development. 1998]Genome Res. 2005 Aug; 15(8):1034-50.
[Genome Res. 2005]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Genes Dev. 2007 Feb 15; 21(4):385-90.
[Genes Dev. 2007]Proc Natl Acad Sci U S A. 2004 Jan 27; 101(4):992-7.
[Proc Natl Acad Sci U S A. 2004]Proc Natl Acad Sci U S A. 2004 Mar 30; 101(13):4537-42.
[Proc Natl Acad Sci U S A. 2004]Bioessays. 2006 Nov; 28(11):1102-11.
[Bioessays. 2006]Nat Rev Genet. 2005 Dec; 6(12):905-16.
[Nat Rev Genet. 2005]Dev Genet. 1998; 23(1):1-10.
[Dev Genet. 1998]Development. 1996 Jan; 122(1):205-14.
[Development. 1996]Annu Rev Entomol. 2003; 48():579-602.
[Annu Rev Entomol. 2003]Nat Biotechnol. 2005 Aug; 23(8):988-94.
[Nat Biotechnol. 2005]Bioinformatics. 2005 Apr 15; 21(8):1747-9.
[Bioinformatics. 2005]Nature. 1990 Aug 30; 346(6287):818-22.
[Nature. 1990]Science. 1990 Aug 3; 249(4968):505-10.
[Science. 1990]Trends Biochem Sci. 1993 Mar; 18(3):77-80.
[Trends Biochem Sci. 1993]Nat Biotechnol. 2002 Aug; 20(8):831-5.
[Nat Biotechnol. 2002]Proc Int Conf Intell Syst Mol Biol. 1994; 2():28-36.
[Proc Int Conf Intell Syst Mol Biol. 1994]Nat Biotechnol. 2005 Aug; 23(8):988-94.
[Nat Biotechnol. 2005]Nucleic Acids Res. 1990 Oct 25; 18(20):6097-100.
[Nucleic Acids Res. 1990]Genome Res. 2004 Jun; 14(6):1188-90.
[Genome Res. 2004]Proc Int Conf Intell Syst Mol Biol. 1994; 2():28-36.
[Proc Int Conf Intell Syst Mol Biol. 1994]PLoS Biol. 2004 Sep; 2(9):E271.
[PLoS Biol. 2004]Genome Biol. 2007; 8(7):R145.
[Genome Biol. 2007]Bioinformatics. 2003; 19 Suppl 1():i292-301.
[Bioinformatics. 2003]Bioinformatics. 2006 Jul 15; 22(14):e454-63.
[Bioinformatics. 2006]Bioinformatics. 2003; 19 Suppl 1():i292-301.
[Bioinformatics. 2003]Genome Res. 2002 Oct; 12(10):1599-610.
[Genome Res. 2002]Nat Biotechnol. 2005 Aug; 23(8):988-94.
[Nat Biotechnol. 2005]Nature. 1997 Apr 10; 386(6625):627-30.
[Nature. 1997]J Bacteriol. 1989 Mar; 171(3):1271-7.
[J Bacteriol. 1989]Genes Dev. 1998 Mar 1; 12(5):745-54.
[Genes Dev. 1998]J Mol Biol. 2001 Oct 19; 313(2):309-15.
[J Mol Biol. 2001]Science. 1995 Jan 6; 267(5194):93-6.
[Science. 1995]PLoS Biol. 2004 Sep; 2(9):E271.
[PLoS Biol. 2004]Nature. 2001 Feb 15; 409(6822):832-3.
[Nature. 2001]Bioinformatics. 2006 Jun 15; 22(12):1532-3.
[Bioinformatics. 2006]Bioinformatics. 2005 Apr 15; 21(8):1747-9.
[Bioinformatics. 2005]Bioinformatics. 2005 Apr 15; 21(8):1747-9.
[Bioinformatics. 2005]Proc Natl Acad Sci U S A. 1996 Jul 9; 93(14):6886-91.
[Proc Natl Acad Sci U S A. 1996]Nucleic Acids Res. 1993 Oct 25; 21(21):4915-22.
[Nucleic Acids Res. 1993]Bioinformatics. 2005 Apr 15; 21(8):1747-9.
[Bioinformatics. 2005]Proc Natl Acad Sci U S A. 1996 Jul 9; 93(14):6886-91.
[Proc Natl Acad Sci U S A. 1996]Nucleic Acids Res. 1993 Oct 25; 21(21):4915-22.
[Nucleic Acids Res. 1993]Cell. 1992 Jan 24; 68(2):201-19.
[Cell. 1992]Nature. 1989 Sep 28; 341(6240):340-3.
[Nature. 1989]Nucleic Acids Res. 1993 Oct 25; 21(21):4915-22.
[Nucleic Acids Res. 1993]Bioinformatics. 2005 Apr 15; 21(8):1747-9.
[Bioinformatics. 2005]PLoS Biol. 2004 Sep; 2(9):E271.
[PLoS Biol. 2004]Bioinformatics. 2003; 19 Suppl 1():i292-301.
[Bioinformatics. 2003]PLoS Biol. 2004 Sep; 2(9):E271.
[PLoS Biol. 2004]Nature. 2000 Feb 3; 403(6769):564-7.
[Nature. 2000]PLoS Biol. 2004 Sep; 2(9):E271.
[PLoS Biol. 2004]Genome Biol. 2004; 5(9):R61.
[Genome Biol. 2004]Bioinformatics. 2003; 19 Suppl 1():i292-301.
[Bioinformatics. 2003]Genome Biol. 2004; 5(9):R61.
[Genome Biol. 2004]Genome Res. 2002 Oct; 12(10):1599-610.
[Genome Res. 2002]BMC Bioinformatics. 2004 Sep 9; 5():129.
[BMC Bioinformatics. 2004]Genome Biol. 2004; 5(9):R61.
[Genome Biol. 2004]Nature. 2000 Feb 3; 403(6769):564-7.
[Nature. 2000]Bioinformatics. 2006 Feb 1; 22(3):381-3.
[Bioinformatics. 2006]Bioinformatics. 2006 Feb 1; 22(3):381-3.
[Bioinformatics. 2006]BMC Bioinformatics. 2004 Sep 9; 5():129.
[BMC Bioinformatics. 2004]Genome Biol. 2004; 5(9):R61.
[Genome Biol. 2004]Nat Biotechnol. 2005 Aug; 23(8):988-94.
[Nat Biotechnol. 2005]Comb Chem High Throughput Screen. 2006 May; 9(4):301-11.
[Comb Chem High Throughput Screen. 2006]Bioinformatics. 2006 Jun 15; 22(12):1532-3.
[Bioinformatics. 2006]Bioinformatics. 2005 Apr 15; 21(8):1747-9.
[Bioinformatics. 2005]Nature. 2003 Dec 18; 426(6968):849-53.
[Nature. 2003]Nat Rev Genet. 2005 Dec; 6(12):893-904.
[Nat Rev Genet. 2005]Genes Dev. 1999 Jul 1; 13(13):1704-16.
[Genes Dev. 1999]Nat Struct Biol. 1999 Apr; 6(4):297-300.
[Nat Struct Biol. 1999]Nat Biotechnol. 2005 Aug; 23(8):988-94.
[Nat Biotechnol. 2005]Proc Natl Acad Sci U S A. 2002 Jan 22; 99(2):757-62.
[Proc Natl Acad Sci U S A. 2002]BMC Bioinformatics. 2002 Oct 24; 3():30.
[BMC Bioinformatics. 2002]PLoS Biol. 2004 Sep; 2(9):E271.
[PLoS Biol. 2004]Genome Biol. 2004; 5(9):R61.
[Genome Biol. 2004]Bioinformatics. 2006 Jun 15; 22(12):1532-3.
[Bioinformatics. 2006]Bioinformatics. 2006 Feb 1; 22(3):381-3.
[Bioinformatics. 2006]Nature. 2005 Sep 15; 437(7057):376-80.
[Nature. 2005]Nucleic Acids Res. 2007; 35(13):e91.
[Nucleic Acids Res. 2007]Science. 2007 Jun 8; 316(5830):1497-502.
[Science. 2007]Cell. 2007 May 18; 129(4):823-37.
[Cell. 2007]Genome Res. 2002 Oct; 12(10):1599-610.
[Genome Res. 2002]