![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2007, Cold Spring Harbor Laboratory Press Transcription factor modularity in a gene-centered C. elegans core neuronal protein–DNA interaction network 1 Program in Gene Function and Expression and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA; 2 Center for Complex Network Research, Department of Physics, University of Notre Dame, Notre Dame, Indiana 46556, USA; 3 Agencourt Bioscience Corporation, Beverly, Massachusetts 01915, USA 4Corresponding author.E-mail marian.walhout/at/umassmed.edu; fax (508) 856-5460. Received November 24, 2006; Accepted March 7, 2007. This article has been cited by other articles in PMC.Abstract Transcription regulatory networks play a pivotal role in the development, function, and pathology of metazoan organisms. Such networks are comprised of protein–DNA interactions between transcription factors (TFs) and their target genes. An important question pertains to how the architecture of such networks relates to network functionality. Here, we show that a Caenorhabditis elegans core neuronal protein–DNA interaction network is organized into two TF modules. These modules contain TFs that bind to a relatively small number of target genes and are more systems specific than the TF hubs that connect the modules. Each module relates to different functional aspects of the network. One module contains TFs involved in reproduction and target genes that are expressed in neurons as well as in other tissues. The second module is enriched for paired homeodomain TFs and connects to target genes that are often exclusively neuronal. We find that paired homeodomain TFs are specifically expressed in C. elegans and mouse neurons, indicating that the neuronal function of paired homeodomains is evolutionarily conserved. Taken together, we show that a core neuronal C. elegans protein–DNA interaction network possesses TF modules that relate to different functional aspects of the complete network. Differential gene expression is an important driving force in the development, function, and pathology of multicellular organisms. Differential gene expression is first regulated at the level of transcription initiation by regulatory transcription factors (TFs) that directly bind to their genomic DNA targets, resulting in an activation or repression of target gene expression. In metazoans, 5%–10% of the genes encode predicted TFs (Levine and Tjian 2003; Reece-Hoyes et al. 2005), each of which likely regulates the expression of multiple target genes. TFs function in the context of intricate transcription regulatory networks that describe gene expression as a function of inputs specified by physical and functional interactions between TFs and DNA (for review, see Blais and Dynlacht 2005; Davidson and Levine 2005; Walhout 2006). Although transcription regulatory networks in unicellular systems have been studied extensively, the architecture and functionality of the networks that control multicellular development and function remain poorly understood. A first step in deciphering transcription regulatory networks is the large-scale mapping of protein–DNA interactions (PDIs) between TFs and their target genes (Walhout 2006). TF-centered approaches such as chromatin immunoprecipitation (ChIP), can be used to identify the DNA sequences bound by a TF in vivo (for review, see Blais and Dynlacht 2005; Elnitski et al. 2006). In complex metazoan systems, such methods are limited to TFs that are widely and highly expressed, and for which suitable antibodies are available. Gene-centered methods such as the Gateway-compatible yeast one-hybrid system (Y1H) provide a high-throughput, condition-independent approach for the systematic identification of PDIs between gene promoters and TFs (Deplancke et al. 2004). The Y1H system can be used to efficiently identify a wide variety of metazoan TFs. For instance, we recently described a Caenorhabditis elegans digestive tract PDI network containing >100 predicted TFs, most of which were previously uncharacterized (Deplancke et al. 2006a). In addition to validating multiple Y1H interactions in vivo, we demonstrated that this PDI network is enriched for TFs that are themselves expressed in the digestive tract. This suggests that the gene-centered Y1H system enables the identification of specific PDI networks that involve genes and TFs that function in a tissue of interest. In C. elegans, 15 types of neurons sense the chemical environment or temperature (Melkman and Sengupta 2005). These neurons are defined by the combinatorial expression of terminal differentiation genes, or “gene batteries.” This expression is accomplished through the action of different combinations of TFs (Hobert 2005). Several TFs have been reported to determine neuronal cell fate and function (Lanjuin and Sengupta 2004; Hobert 2005). However, it is unclear how the expression of these TFs themselves is regulated. To gain insight into the transcription regulatory networks that govern neuronal TF expression, we mapped a PDI network with all TF-encoding genes that are known to be expressed and/or function in C. elegans chemo- and thermosensory neurons. Since only TF-encoding genes are used as target genes, the resulting network can be considered a “core neuronal PDI network” (Davidson et al. 2003; Deplancke et al. 2006a). We find that the core neuronal PDI network contains two TF modules that each associate with different functional aspects of the network: one module contains TFs involved in reproduction and connects to target genes that are expressed in neurons and other tissues, whereas the other module is enriched for paired homeodomain TFs that bind to target genes that are primarily expressed in neurons. We find that paired homeodomain TFs tend to be exclusively expressed in neurons in both C. elegans and in mouse, suggesting that their neuronal function is evolutionarily conserved. Results Mapping a core neuronal PDI network by gene-centered Y1H assays We selected 50 promoters of TF-encoding genes as DNA baits for Y1H assays. These promoters correspond to TF-encoding genes that are known to be expressed and/or function in chemo- and thermosensory neurons and their interneurons (Supplemental Table S1). We successfully created Y1H bait strains for 47 of the 50 promoters (94%, Supplemental Fig. S1; Supplemental Table S2). To attain high PDI coverage, we performed four Y1H assays: all baits were screened versus an AD-wrmcDNA library (Walhout et al. 2000b) and an AD-TF mini-library (Deplancke et al. 2004), several baits (Supplemental Table S2) were mated versus an AD-TF yeast array (V. Vermeirssen, B. Deplancke, M.I. Barrasa, J.S. Reece-Hoyes, H.E. Arda, C.A. Grove, N.J. Martinez, R. Sequerra, L. Doucette-Stamm, M.R. Brent, et al., in prep.) and all baits were used in a final Y1H matrix experiment using available interactor TFs (Supplemental Fig. S1; Supplemental Table S2). Combined, these assays retrieved 376 PDIs (Supplemental Table S3). High-throughput methods have the advantage to rapidly generate large datasets, but such datasets may contain false positive information. In order to minimize the inclusion of false positives, we developed a stringent standardized scoring system for high-throughput Y1H assays and applied it to our data set (Supplemental Materials; Supplemental Fig. S2; Supplemental Table S4). The scoring system takes several criteria, which contribute to the quality of a PDI into account, including the DNA bait, the interactor prey, and the interaction itself. After scoring, we extracted a high-confidence Y1H data set consisting of 282 PDIs between 38 promoters and 94 interactors. We visualized the high-confidence PDIs into a “core neuronal PDI network” graph. All PDIs, except one, were connected into a single network (Fig. 1A;
From PDIs to transcription regulatory interactions by data integration Y1H data do not provide insight into the transcriptional consequences of PDIs, i.e., activation or repression (Walhout 2006). By integrating physical interactions with previously reported regulatory information, we converted several PDIs into transcription regulatory interactions (Fig. 1B We found two novel physical interactions between components that were known to function together in the diversification of bilateral asymmetric neurons: DIE-1 interacted with Pcog-1 and FOZI-1 bound to Plim-6 (Fig. 1B We also identified three interologs or regulogs, i.e., evolutionarily conserved PDIs (Walhout et al. 2000a; Yu et al. 2004): VAB-3 bound to Pdac-1, TTX-1 bound to Punc-30, and UNC-30 bound its own promoter (Fig. 1B Functional properties of the core neuronal PDI network We characterized the functional properties of the network by available annotations of the interactor TFs. Since we chose a set of neuronal TF-encoding genes as targets in Y1H assays, we hypothesized that the core neuronal PDI network may be enriched for neuronally expressed interactor TFs, but not for TFs that are expressed in other tissues. In WormBase version WS153 (http://www.wormbase.org), expression pattern information was available for 293 of the 940 predicted C. elegans TFs (V. Vermeirssen, B. Deplancke, M.I. Barrasa, J.S. Reece-Hoyes, H.E. Arda, C.A. Grove, N.J. Martinez, R. Sequerra, L. Doucette-Stamm, M.R. Brent, et al., in prep.). WS153 contained expression pattern information for 66% of the interactor TFs in the network. Of all 293 TFs for which expression patterns are available, 63% had a neuronal expression pattern and 73% were expressed in multiple tissues, suggesting an extensive “re-use” of TFs in different biological processes. Despite this fact, we did observe a significant enrichment for TFs expressed in neurons in the core neuronal PDI network (76%, P < 0.05), but not for TFs expressed in the digestive tract, epithelial system, reproductive system, muscle, or excretory system in C. elegans (Table 1).
The interactor TFs in the core neuronal PDI network were significantly enriched for the homeodomain DNA-binding domain (28% in the core neuronal PDI network versus 10% in all predicted TFs, P < 0.001; Fig. 2
To further functionally characterize the interactor TFs, we inspected their Biological Process terms in the Gene Ontology (GO) database (Ashburner et al. 2000) (Supplemental Table S8). Compared with all predicted TFs in C. elegans, the interactor TFs in the core neuronal PDI network were significantly enriched for the GO terms “development” (P < 0.01) and “response to stimulus” (P < 0.001), both of which could be expected due to the selection of the target genes. Surprisingly, we also found an enrichment for the GO term “reproduction” (P < 0.01). Taken together, we found that the core neuronal PDI network is enriched for interactor TFs that are neuronally expressed, that possess a paired homeodomain DNA-binding domain, and that associate with GO terms that relate to both organism development and function. The core neuronal PDI network contains both interactor and promoter hubs To explore the architecture of the core neuronal PDI network, we first examined the degree distribution of both the promoter and the interactor nodes. As expected, the outgoing connectivity of the core neuronal PDI network followed a power law (Fig. 3A
The digestive tract PDI network, which contains target genes expressed in the C. elegans pharynx, intestine, and/or rectum (Deplancke et al. 2006a), contains a number of interactor hubs that bind to target genes that are expressed in both the pharynx and the intestine, even though these organs are derived from distinct germ layers. Therefore, we hypothesized that these interactor hubs may function as global regulators of gene expression i.e., they control the expression of many genes expressed in many different tissues. Eighty three percent of the interactor hubs (here defined as the 5% most highly connected interactors) in the core neuronal PDI network were also retrieved as interactor hubs in the digestive-tract PDI network, whereas only 47% of the less well-connected interactors overlap (Fig. 3C Cross-regulation Cross-regulation occurs when, within a system of interest (i.e., cell, tissue, or organ), many TFs regulate each other’s expression. Previously, cross-regulation has been observed in both human and yeast regulatory networks (Borneman et al. 2006; Odom et al. 2006). In the core neuronal PDI network, we also found significant cross-regulation, as 16 of the 94 interactors corresponded to TFs that were included in the target gene set (P < 0.001). Interestingly, multiple target genes encode homeodomain TFs (41%), and the core neuronal PDI network is enriched for homeodomain interactor TFs (Fig. 2 The core neuronal PDI network contains two TF modules Metabolic and protein–protein interaction networks can often be decomposed into functional modules: groups of highly interconnected components that together carry out particular biological functions (Ravasz et al. 2002; Han et al. 2004; Yook et al. 2004; Gunsalus et al. 2005). To examine whether the core neuronal PDI network possesses a modular architecture, we analyzed the topological overlap coefficient (TOC) or mutual clustering coefficient for each pair of nodes (Ravasz et al. 2002; Goldberg and Roth 2003). This is a relative measure for the number of interaction partners shared by a pair of nodes, and ranges from 0 for node pairs that do not share any interacting partners to 1 for node pairs that share all interacting partners (Fig. 4A
Functional aspects of the core neuronal PDI network are reflected in the two TF modules The two interactor modules could be traced back to different sets of promoter hubs (Fig. 4C Next, we asked whether the functional properties of the core neuronal PDI network were reflected in the modules. As expected, both modules contained many neuronally expressed TFs (Supplemental Table S6). However, we found several functional differences between the two modules. Only module 1 was enriched for the GO terms “reproduction” and “response to stimulus” (P < 0.001, Supplemental Table S8). Since the numbers are small, we also investigated the available mutant and RNAi phenotypes as documented in WormBase (http://www.wormbase.org) and found the same results (Fig. 4C Paired homeodomain TFs are expressed in neurons in both C. elegans and mouse Since the core neuronal PDI network and module 2 were both enriched for interactor TFs that are expressed in neurons and that possess a (paired) homeodomain DNA-binding domain, we hypothesized that (paired) homeodomain-containing TFs are likely to be neuronally expressed. To test this, we analyzed the available expression patterns of all predicted C. elegans TFs in WS153. Indeed, we found a positive association between neuronal expression and homeodomains both for TFs that are expressed in neurons as well as in other tissues (P < 0.01) and for TFs that are exclusively neuronal (P < 0.05; Fig. 5A
Next, we examined whether the positive association between paired homeodomains and neuronal expression is evolutionarily conserved. We analyzed murine tissue-specific TF gene expression profiles available in SymAtlas (Su et al. 2004). We also observed a significant enrichment for murine homeodomain TFs that are exclusively expressed in neurons (P < 0.001; Fig. 5B Finally, we noticed additional correlations between TF families and expression patterns (Fig. 5 Discussion Gene-centered PDI networks In this study, we present a gene-centered PDI network of neuronal TF-encoding genes in the nematode C. elegans. This network contains 282 high-confidence PDIs, between 38 promoters and 94 interactors. Most of the PDIs are novel and most TFs retrieved were heretofore uncharacterized. This demonstrates that gene-centered approaches rapidly expand our knowledge about PDIs and help annotate both individual TFs and TF families (see below). Several observations demonstrate that the Gateway-compatible Y1H system yields high-quality PDIs. First, we show that a stringent and standardized scoring system can be used to extract high-confidence Y1H interactions. Second, we show that networks derived from sets of target genes expressed in a particular tissue are enriched for interactors that are also expressed in that tissue (this study; Deplancke et al. 2006a). Third, by integrating PDI data with regulatory information, we convert PDIs into transcription regulatory interactions. Fourth, we find several interologs. Fifth, many PDIs can be connected to previously reported observations. For instance, we find interactions between factors and target genes that are expressed in the same cell(s), e.g., Pnhr-38/CEH-14 in AFD sensory neurons (Miyabayashi et al. 1999; Cassata et al. 2006) and Punc-30/ALR-1 in GABAergic neurons (Melkman and Sengupta 2005). In addition, we find a putative regulatory cascade that may be involved in dauer formation: DAF-16 binds the daf-3 promoter, DAF-3 binds the daf-19 promoter, and DAF-19 regulates its own expression. Network architecture Cellular networks are characterized by a scale-free connectivity distribution due to the presence of highly connected nodes, or hubs (Barabasi and Oltvai 2004). Our observation that C. elegans interactor hubs connect to genes expressed in different tissues and cell types suggest that such global regulators function throughout the animal. In agreement with this, we previously found that global regulators tend to be broadly expressed, to be essential for viability, and to be toxic when overexpressed (Deplancke et al. 2006a). We find that interactors with a low out-degree tend to be more specific for either the neuronal or the digestive tract PDI network. Together, the finding of global and specifier regulators supports a model of a layered organization of TF function in C. elegans transcription regulatory networks (Deplancke et al. 2006a). The in-degree distribution of transcription regulatory networks in yeast and bacteria has been reported to decay exponentially, which is similar to the degree distribution of random networks, and suggests that there are no clear promoter hubs (Thieffry et al. 1998; Guelzim et al. 2002). However, highly connected promoters have been described in yeast regulatory networks (Yu et al. 2004; Borneman et al. 2006). The incoming degree distribution of the core neuronal PDI network was best fit by a power law with saturation. Although the biological significance of this fit is at present unclear, it may result from the gene duplication driven growth and evolution of (biological) networks (Albert and Barabasi 2000; Vazquez et al. 2003). The incoming degree distribution points to the presence of promoter hubs. Indeed, we do find several promoter hubs such as Punc-30 that bind 36 different TFs. Such promoter hubs may be specific for core transcription regulatory networks, because the promoters of TF-encoding genes have been proposed to be subject to more complex regulation than other genes (Nelson et al. 2004; Woolfe et al. 2005). Future studies with non-TF genes are required to determine whether the in- and out-degree distribution of the core neuronal PDI network are a reflection of the complete C. elegans transcription regulatory network. Previously, it has been suggested that highly regulated TFs may function as master regulators of development (Borneman et al. 2006). This notion is supported by the fact that UNC-30, the most highly connected promoter hub in our network, functions as a master regulator for the terminal differentiation of type-D GABAergic motor neurons (Jin et al. 1994). TF modularity in PDI networks Through a combination of PDI mapping, network analysis, TF family annotation, and gene expression and ontology analysis, we demonstrate for the first time that a metazoan PDI network is organized into TF modules that relate to specific functionalities. Previously, it has been shown that regulatory networks from unicellular systems such as bacteria and yeast possess a modular architecture (Babu et al. 2004; Resendis-Antonio et al. 2005). Bacterial PDI network modularity was found by clustering the shortest path length between any pair of genes in an undirected manner (Resendis-Antonio et al. 2005). Since PDI networks contain directed interactions, two types of modules can potentially occur: gene modules that contain genes that share interacting TFs or TF modules that contain TFs that share target genes. Gene modules have previously been identified by expression-profiling correlation of TFs and their target genes (Tavazoie et al. 1999; Segal et al. 2003; Ghazalpour et al. 2006) and by integrating such data with physical interaction data obtained from TF-centered approaches (Bar-Joseph et al. 2003; Beyer et al. 2006). We did not observe any target-gene modularity in the core neuronal PDI network. This is likely because we only used promoters of neuronal genes and because most promoters bind a combination of global and specifier TFs, most of which bind only one or two promoters (Fig. 3A We did find two TF modules in the core neuronal PDI network. In metazoan PDI networks, such TF modularity is uniquely revealed by a gene-centered approach, which enables the identification of numerous TFs (94 interactors in this study) in a standardized manner. Two TF modules consist of specifier TFs and are connected to each other by several other TFs, including all of the putative global regulators. Moreover, we find that the two modules associate with different functional aspects of the total network. One of the most striking findings is that module 2 is enriched for paired homeodomain TFs and contains target genes that are predominantly expressed in neurons (which is in contrast to the targets of module 1 that are expressed more broadly). This correlation suggests that paired homeodomain TFs specifically regulate neuronal gene expression. Indeed, we find that paired homeodomain TFs tend to be exclusively expressed in neurons in both C. elegans and mouse. Several homeodomain genes are known to play a role in the development of the central nervous system in Drosophila and vertebrates (Kammermeier and Reichert 2001; Akin and Nazarali 2005). Recently, an over-representation of homeodomain binding sites has been detected in the promoters of odorant receptors in mouse (Michaloski et al. 2006), and some of these were shown to be required for normal odorant receptor expression (Rothman et al. 2005). This further confirms that homeodomain TFs function in sensory neurons. Several individual C. elegans and murine-paired homeodomain TFs are known to function in neurons (Miller et al. 1992; Jin et al. 1994; Baran et al. 1999; Pujol et al. 2000; Altun-Gultekin et al. 2001; Boyl et al. 2001; Satterlee et al. 2001; Lanjuin et al. 2003; Branicky and Hekimi 2005; Melkman and Sengupta 2005; Tucker et al. 2005; Friocourt et al. 2006). Our data suggest that uncharacterized (paired) homeodomain TFs may also function to regulate neuronal gene expression and function. Multiple paired homeodomain TFs in module 2 bind the same target promoters, and may perhaps interact with the same site within these promoters. Since we found that paired homeodomain TFs tend to be specifically expressed in neurons, this suggests that several members of this TF family may bind the same neuronal gene promoter in vivo, for instance, under different conditions during development or function of the animal. Future studies, for instance, using ChIP, will be required to determine whether all or some of these TFs all interact with their Y1H targets in vivo. In addition, it will be important to determine which of the interacting TFs actually affect target gene expression in vivo and where and when in the (developing) animal these effects occur. Taken together, by integrating physical interactions with regulatory events, TF families, expression patterns and profiles, and functional annotations, we show that metazoan PDI networks have a modular architecture that relates to network functionality. Such modularity will provide a powerful tool to understand how networks relate to biology and, potentially, to annotate gene function in complex metazoan organisms. Methods Generation of Y1H promoter bait strains Detailed Y1H protocols are described elsewhere (Deplancke et al. 2006b). Promoters for 47 neuronal transcription regulatory genes were selected as DNA baits (Supplemental Table S1). For daf-16 and daf-12, two promoters were selected, based on different variants of each gene. For ceh-37, two promoter baits were created, one upstream of the ATG and one upstream of the 5′-UTR, since this is >10 kb upstream of the first exon. Twenty promoter Entry clones were retrieved from the promoterome (Dupuy et al. 2004). We attempted to clone 30 promoters ab initio (i.e., by PCR from genomic DNA using the translational start reported in WS153), and cloned all except Pdaf-12a and Pnhr-36 (Supplemental Fig. S1). With the exception of Pfozi-1, all promoters were transferred into the Y1H destination vectors pMW#2 and pMW#3 and integrated into the genome of Saccharomyces cerevisae YM4271. Promoters that exhibit high self-activation for the lacZ reporter could only be analyzed for activation of the HIS3 reporter (Supplemental Table S2). Pdaf-12b lacZ could not be integrated into the yeast genome, and thus, only the HIS3 reporter could be used for this promoter.Y1H screens were performed with individual DNA bait strains versus both AD-wrmcDNA (Walhout et al. 2000b) and AD-TF (Deplancke et al. 2004) prey libraries (Supplemental Fig. S1). Only double positives were considered, except when lacZ was absent or highly self-active. All PDIs were retested by PCR/gap repair. PCR products corresponding to preys that retested were sequenced by Agencourt Bioscience Corporation. In total, 805 Interaction Sequence Tags were obtained (i.e., 5′ tag sequences of the interactor preys) (Walhout et al. 2000a). The PDI data for Pdaf-3, Pdaf-19, Pdie-1, and PT22H9.4 were complemented by results obtained previously (Deplancke et al. 2006a). Y1H mating assays against an AD-TF array (V. Vermeirssen, B. Deplancke, M.I. Barrasa, J.S. Reece-Hoyes, H.E. Arda, C.A. Grove, N.J. Martinez, R. Sequerra, L. Doucette-Stamm, M.R. Brent, et al., in prep.) were performed with 16 bait strains (Supplemental Table S2). Finally, all available interactors were transformed into each of the promoter strains. In addition to verifying PDIs, this enables the identification of additional PDIs (Supplemental Fig. S1). Preys found by Y1H screens and mating experiments were used, as well as TFs corresponding to the Y1H promoter baits (150 preys were used in total; Supplemental Table S3). The following preys could not be examined because an ORF clone was not available: HMG-11, LIN-49, NHR-36, NHR-45, NHR-83, R06C1.6, UNC-86, Y59E9AL.2/3, and ZK287.6. Ninety-nine percent of the 7050 transformations were successful. We performed these different Y1H assays because screens alone are not saturating. Many TFs are likely under-represented because the cDNA library is not normalized, and may therefore be difficult to retrieve when a few million colonies are screened. We find that we obtain the highest coverage by screening both libraries. Some TFs cannot be detected when full-length proteins are used in Y1H. However, since the cDNA library contains many incomplete ORFs, these may be retrieved from this library. In addition, the cDNA library enables the identification of novel putative TFs. C. elegans TF expression patterns Only interactors possessing a predicted DNA-binding domain were considered (i.e., novel putative TFs were excluded). Expression pattern information for the 940 predicted C. elegans TFs in wTF2.1 (V. Vermeirssen, B. Deplancke, M.I. Barrasa, J.S. Reece-Hoyes, H.E. Arda, C.A. Grove, N.J. Martinez, R. Sequerra, L. Doucette-Stamm, M.R. Brent, et al., in prep.) was retrieved using WormMart in WormBase WS153. An expression-pattern code was attributed to each TF as follows: (E) embryonic, (G) germline, (H) epithelial tissues except for neuronal support cells, (I) intestine, (M) muscle, (N) neuron, (O) other or unidentified cells, (P) pharynx, (R) somatic reproductive tissue, (S) neuronal support cells (socket/sheath/glial cells), and (X) excretory system. These can be grouped to describe expression in the following systems: (HS) epithelial system, (PI) digestive tract, and (GR) reproductive system. In the reproductive and digestive tract systems, only expression in the structural parts (i.e., musculature, glands, epithelium) was annotated, while expression in the neurons within those systems was considered N. Some examples: pharyngeal neurons (N), vulval muscle (MR), rectal epithelium (HI), amphids or phasmids (NS), male sensory rays (NS), broad (EGHIMNOPRSX), and all somatic cells (EHIMNOPRSX). Embryonic expression was not included in subsequent analyses. Expression pattern information was available for 293 (N) of the 940 predicted TFs in wTF2.1. Information was available in WS153 for 55 (n) of the 83 interactor TFs in the core neuronal PDI network. Enrichment for expression in a certain tissue was calculated using a hypergeometric distribution. For a set of n (55) interactor TFs, of which k are annotated with an expression pattern in a certain tissue that exists in K of the N (293) C. elegans TFs, the hypergeometric P-value is given by:
C. elegans DNA-binding domains DNA-binding domains for the 940 predicted C. elegans TFs were extracted from wTF2.1 (V. Vermeirssen, B. Deplancke, M.I. Barrasa, J.S. Reece-Hoyes, H.E. Arda, C.A. Grove, N.J. Martinez, R. Sequerra, L. Doucette-Stamm, M.R. Brent, et al., in prep.). Only DNA-binding domain classes with 30 or more members were considered (homeodomain, AT hook, basic helix-loop-helix, bZIP, winged helix, zinc finger, and others). Zinc fingers were divided into C2H2 zinc fingers, nuclear hormone receptors, and other zinc fingers. By a χ2 test we first analyzed whether the nodes in the network represent a random sample of wTF2.1. Any class with less than five expected members in the χ2 test was added to the class of “other DNA-binding domains.” The enrichment or depletion for a specific DNA-binding domain in the network was determined by a hypergeometric distribution (N = 962 DNA binding domains—some TFs possess multiple DNA-binding domains). A similar approach was followed to analyze the DNA-binding domain distribution within the homeodomain class. Gene Ontology analysis The enrichment of Biological Process Gene Ontology terms for the interactor TFs in the network or modules was calculated by a Fisher test using only genes for which such terms were available in WormBase (WS164). They were available for 83% of the interactor TFs in the network and for 82% and 100% of the TFs in modules 1 and 2, respectively. Biological Process Gene Ontology terms were available for 76% of all 940 predicted C. elegans TFs. This was used as background population in the statistical analysis. We calculated the enrichment in the complete network for the following terms: reproduction, development, physiological process, growth, cellular process, regulation of biological process, and response to stimulus. In the modules, we only examined the three GO terms that were significantly enriched in the entire network (reproduction, development, and response to stimulus). Nominal P-values are included in the main text. We also corrected for multiple hypothesis testing by applying a Bonferroni correction (Supplemental Table S8). Finally, we confirmed the statistical enrichments by an independent method, FuncAssociate, which is based on random sampling (Berriz et al. 2003; data not shown). The enrichment of Biological Process Gene Ontology terms for the paired homeodomains in mouse was calculated by the DAVID functional annotation tool (http://david.abcc.ncifcrf.gov/) (Dennis et al. 2003). Cross-regulation The enrichment for cross-regulation, i.e., the retrieval of interactor TFs of which the promoters were also used as DNA baits, was calculated by a hypergeometric distribution (see above) using all 940 TFs (N) as the population. In total, 16 (k) of the 36 TF-encoding target genes (K) were retrieved as interactors of the 83 TFs (n) in the network. C. elegans DNA-binding domain—Expression pattern association WS153 was used to examine whether there is an association (Fisher test) between the presence of a particular DNA-binding domain in a TF and the expression of this TF in a particular C. elegans tissue. A total of 304 expression patterns were available for all DNA-binding domains (some TFs contain multiple DNA-binding domains). WS153 contained expression information for 69% of all homeodomains, 23% of all AT hooks, 51% of all basic helix-loop-helices, 25% of all bZIPs, 56% of all winged helices, 21% of all C2H2 zinc fingers, 20% of all nuclear hormone receptors, 35% of all other zinc fingers, and 32% of all other DNA-binding domains in wTF2.1. Within the homeodomain class, HOX, paired, and NK homeodomains were examined. The following tissues were considered: neurons, muscle, digestive tract, epithelial system, reproductive system, and excretory system. Mouse DNA-binding domains Mouse TFs were compiled by first downloading mouse TF predictions from DBD, a transcription factor database (version 1.2) (Kummerfeld and Teichmann 2006). Ensembl protein IDs were mapped to gene IDs using BioMart (http://www.ensembl.org/Multi/martview) and all predictions for a gene were merged. A total of 1305 mouse TFs that possessed a total of 1421 DNA-binding domains were obtained. To identify the specific types of homeodomain within the homeodomain class, the INTERPRO (version 12.1) identifiers IPR007104 for paired, IPR007107 for LIM, and IPR007103 for POU homeodomains were used. HOX and NK homeodomains were identified from the literature, as INTERPRO identifiers are not available for these (Luke et al. 2003; Akin and Nazarali 2005). Additional Methods are available in Supplemental Materials. Acknowledgments We thank members of the Walhout laboratory, Jason Perry, Mark Alkema, and Job Dekker for advice and critical reading of the manuscript, and the sequencing staff at Agencourt Bioscience for technical assistance. We thank Piali Sengupta for discussions and advice on target gene selection, and Fritz Roth for statistical advice. This work was supported by grants NIH DK068429 (A.J.M.W.), NIH A1070499-01 (A.L.B.), CA113004 (A.L.B.), NSF IIS-0513650 (A.L.B. and C.H.), and a D. Collen Research Foundation-Belgian American Educational Foundation (BAEF) fellowship for Biomedical and Biotechnology Research to V.V. Footnotes [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6148107 References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Nature. 2003 Jul 10; 424(6945):147-51.
[Nature. 2003]Genome Biol. 2005; 6(13):R110.
[Genome Biol. 2005]Genes Dev. 2005 Jul 1; 19(13):1499-511.
[Genes Dev. 2005]Proc Natl Acad Sci U S A. 2005 Apr 5; 102(14):4935.
[Proc Natl Acad Sci U S A. 2005]Genome Res. 2006 Dec; 16(12):1445-54.
[Genome Res. 2006]Genome Res. 2006 Dec; 16(12):1445-54.
[Genome Res. 2006]Genes Dev. 2005 Jul 1; 19(13):1499-511.
[Genes Dev. 2005]Genome Res. 2006 Dec; 16(12):1455-64.
[Genome Res. 2006]Genome Res. 2004 Oct; 14(10B):2093-101.
[Genome Res. 2004]Development. 2005 Apr; 132(8):1935-49.
[Development. 2005]Curr Opin Neurobiol. 2004 Feb; 14(1):22-30.
[Curr Opin Neurobiol. 2004]Proc Natl Acad Sci U S A. 2003 Feb 18; 100(4):1475-80.
[Proc Natl Acad Sci U S A. 2003]Genome Res. 2004 Oct; 14(10B):2093-101.
[Genome Res. 2004]BMC Genomics. 2007 Jan 18; 8():21.
[BMC Genomics. 2007]Genome Res. 2006 Dec; 16(12):1445-54.
[Genome Res. 2006]Development. 1999 May; 126(10):2241-51.
[Development. 1999]Mol Cell. 2000 Mar; 5(3):411-21.
[Mol Cell. 2000]Development. 2001 Sep; 128(17):3269-81.
[Development. 2001]Development. 2006 Sep; 133(17):3317-28.
[Development. 2006]Trends Genet. 2004 Jun; 20(6):227-31.
[Trends Genet. 2004]Mol Cell. 2001 Mar; 7(3):451-60.
[Mol Cell. 2001]Neuroendocrinology. 2003 Sep; 78(3):129-37.
[Neuroendocrinology. 2003]FASEB J. 1991 Dec; 5(15):3092-9.
[FASEB J. 1991]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Nature. 2003 Jan 16; 421(6920):231-7.
[Nature. 2003]Nature. 2005 Mar 24; 434(7032):462-9.
[Nature. 2005]Genome Biol. 2005; 6(13):R110.
[Genome Biol. 2005]Genes Dev. 2006 Feb 15; 20(4):435-48.
[Genes Dev. 2006]Mol Syst Biol. 2006; 2():2006.0017.
[Mol Syst Biol. 2006]Science. 2002 Aug 30; 297(5586):1551-5.
[Science. 2002]Nature. 2004 Jul 1; 430(6995):88-93.
[Nature. 2004]Proteomics. 2004 Apr; 4(4):928-42.
[Proteomics. 2004]Nature. 2005 Aug 11; 436(7052):861-5.
[Nature. 2005]Proc Natl Acad Sci U S A. 2003 Apr 15; 100(8):4372-6.
[Proc Natl Acad Sci U S A. 2003]Mol Cell. 2001 Mar; 7(3):451-60.
[Mol Cell. 2001]Genes Dev. 1997 Oct 15; 11(20):2679-90.
[Genes Dev. 1997]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]Science. 2004 Dec 24; 306(5705):2255-7.
[Science. 2004]Dev Biol. 1999 Nov 15; 215(2):314-31.
[Dev Biol. 1999]Development. 2005 Apr; 132(8):1935-49.
[Development. 2005]Nat Rev Genet. 2004 Feb; 5(2):101-13.
[Nat Rev Genet. 2004]Bioessays. 1998 May; 20(5):433-40.
[Bioessays. 1998]Nat Genet. 2002 May; 31(1):60-3.
[Nat Genet. 2002]Trends Genet. 2004 Jun; 20(6):227-31.
[Trends Genet. 2004]Genes Dev. 2006 Feb 15; 20(4):435-48.
[Genes Dev. 2006]Genome Biol. 2004; 5(4):R25.
[Genome Biol. 2004]Curr Opin Struct Biol. 2004 Jun; 14(3):283-91.
[Curr Opin Struct Biol. 2004]Trends Genet. 2005 Jan; 21(1):16-20.
[Trends Genet. 2005]Nat Genet. 1999 Jul; 22(3):281-5.
[Nat Genet. 1999]Nat Genet. 2003 Jun; 34(2):166-76.
[Nat Genet. 2003]PLoS Genet. 2006 Aug 18; 2(8):e130.
[PLoS Genet. 2006]Brain Res Bull. 2001 Aug; 55(6):675-82.
[Brain Res Bull. 2001]Cell Mol Neurobiol. 2005 Jun; 25(3-4):697-741.
[Cell Mol Neurobiol. 2005]Genome Res. 2006 Sep; 16(9):1091-8.
[Genome Res. 2006]Mol Cell Neurosci. 2005 Mar; 28(3):535-46.
[Mol Cell Neurosci. 2005]Nature. 1992 Feb 27; 355(6363):841-5.
[Nature. 1992]Genome Res. 2004 Oct; 14(10B):2169-75.
[Genome Res. 2004]Genome Res. 2004 Oct; 14(10B):2093-101.
[Genome Res. 2004]Bioinformatics. 2003 Dec 12; 19(18):2502-4.
[Bioinformatics. 2003]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D74-81.
[Nucleic Acids Res. 2006]Proc Natl Acad Sci U S A. 2003 Apr 29; 100(9):5292-5.
[Proc Natl Acad Sci U S A. 2003]Cell Mol Neurobiol. 2005 Jun; 25(3-4):697-741.
[Cell Mol Neurobiol. 2005]