![]() | ![]() |
Formats:
|
||||||||||||||
Transcription factor functionality and transcription regulatory networks Program in Gene Function and Expression and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA. E-mail: marian.walhout/at/umassmed.edu Corresponding author. The publisher's final edited version of this article is available at Mol Biosyst. See other articles in PMC that cite the published article.Abstract Now that numerous high-quality complete genome sequences are available, many efforts are focusing on the “second genomic code”, namely the code that determines how the precise temporal and spatial expression of each gene in the genome is achieved. In this regard, the elucidation of transcription regulatory networks that describe combined transcriptional circuits for an organism of interest has become valuable to our understanding of gene expression at a systems level. Such networks describe physical and regulatory interactions between transcription factors (TFs) and the target genes they regulate under different developmental, physiological, or pathological conditions. The mapping of high-quality transcription regulatory networks depends not only on the accuracy of the experimental or computational method chosen, but also relies on the quality of TF predictions. Moreover, the total repertoire of TFs is not only determined by the protein-coding capacity of the genome, but also by different protein properties, including dimerization, co-factor interactions and post-translational modifications. Here, we discuss the factors that influence TF functionality and, hence, the functionality of the networks in which they operate. 1 Introduction Transcription regulatory networks can be represented as graph models that combine physical and regulatory interactions between TFs and their target genes (reviewed in ref. 1). Several methods that can be used to identify physical interactions between TFs and their targets have been developed and applied to the study of both yeast and metazoan transcription regulatory networks. These include TF-centered methods such as chromatin immunoprecipitations,2–4 protein binding microarrays,5 DamID6,7 and bacterial one-hybrid assays,8 as well as gene-centered methods such as high-throughput yeast one-hybrid assays.9 The TF-DNA interaction data obtained by these methods are often visualized into network models. Fig. 1
TF modules are another feature of transcription regulatory networks that have recently emerged.11 TF modules are distinct from more extensively studied general network modules that are defined as highly interconnected groups of nodes without regard for directionality or node type (i.e., TFs vs. target genes).13,14 We have defined TF modules as sets of TFs that share many of their target genes (Fig. 1 The transcription regulatory networks that have been described to date represent compilations of multiple events that take place during the lifetime of an organism collapsed into a single model. However, in reality, only a subset of the network is active in particular cell types, under different developmental or physiological conditions, or at any given time.17–19 In addition, each TF in regulatory networks is represented as a single node, whereas it is known that TFs exist in many different functional forms that are determined by a variety of factors including post-translational modifications and dimerization. These factors themselves may depend on specific developmental or physiological conditions. Each TF form may interact with distinct target genes, or with the same target gene, but under different conditions. Here, we discuss the factors that need to be taken into account to determine how many functional TFs occur in an organism of interest, and how this information can be incorporated into transcription regulatory network models to study differential gene expression at a systems level. 2 Transcription factor predictions There are two classes of TFs: basal TFs that are involved in transcription of most, if not all, genes, and regulatory TFs that control only subsets of genes.20 For the understanding of differential gene expression at a systems level, we only consider regulatory TFs (hereafter referred to as TFs). TFs interact with their target genes by binding specific cis-regulatory gene elements through a sequence-specific DNA binding domain. Different DNA binding domains are used to group TFs into TF families. Examples of DNA binding domains include basic region helix–loop–helix domains (bHLH), homeodomains and various types of zinc fingers. Computational tools have been developed both to define consensus DNA binding domains and to predict additional TFs of that family encoded by a genome of interest.21,22 We found that, although such computational tools are powerful, they do incorporate false predictions and miss many known TFs.23 For instance, by using a combination of computational tools and extensive manual curation we predicted a high-quality compendium of 934 TFs in the nematode Caenorhabditis elegans, which extended purely computational predictions by ~50%.23 However, even though this compendium is more complete than previous collections, it is not yet comprehensive as algorithms and experimental assays continue to improve and, therefore, additional TFs continue to be discovered.24 3 DNA binding domains Over the past decades, many different sequence-specific DNA binding domains have been uncovered. However, we propose that it is unlikely that all DNA binding domains are known. This is because, by applying yeast one-hybrid assays to only 112 C. elegans gene promoters, we have already discovered 11 C. elegans proteins that robustly interact with their target promoters in yeast, but that do not possess a known DNA binding domain. By using chromatin immunoprecipitation assays in yeast, we confirmed that these interactions are direct for nine of these proteins.10,11 We do not know yet whether these proteins directly bind to DNA or if they are recruited to their target promoters by interacting with other DNA binding proteins. Future structure–function analysis will provide insight into the mechanism of action of these novel putative TFs. Importantly, the cataloging of the DNA binding domain(s) of these proteins may enable the identification of additional proteins with similar domains in C. elegans, and perhaps in other organisms as well. 4 Alternative splicing In metazoans, many gene transcripts, including those encoding TFs, are alternatively spliced, which often leads to multiple variants of a protein. Interestingly, it has been found that TF-encoding genes in mice undergo alternative splicing more frequently than other genes.25 Alternative splicing may lead to TFs with different functions. For instance, DNA binding domains or transcription regulatory domains may be included or excluded from the TF variant. At least 144 C. elegans TFs undergo alternative splicing, resulting in 379 different proteins, 30 of which lack a DNA-binding domain.23 The latter may function as regulators of TF function, for instance by titrating interaction partners of the corresponding TFs that do possess a DNA binding domain. Several C. elegans TFs contain more than one DNA binding domain and alternative splicing can affect which domains are present in the different protein products. For instance, several DAF-16 variants are generated as a result of alternative splicing, and each variant carries a unique combination of domains26 (Fig. 2(A)
5 Dimerization Several TFs bind DNA as obligatory dimers, including members from the basic region leucine zipper (bZIP), bHLH and nuclear hormone receptor (NHR) families.30–32 Dimerization should be taken into account when considering the total complement of functional TFs because, if a particular TF only functions when it dimerizes with another TF, the dimer should be considered a single functional unit. Dimerization can affect the total number of functional TFs in different ways (Fig. 2(B) 6 Post-translational modifications Many proteins, including TFs, are post-translationally modified under different conditions and by different modifiers. Several post-translational modifications of TFs have been reported, including phosphorylation, hydroxylation, acetylation, ubiquitination and sumoylation (Fig. 2(C) 7 Ligands Many TFs become activated or inactivated as a result of ligand binding (Fig. 2(D) Another class of ligand-binding TFs is the bHLH-PAS sub-family that includes the aryl hydrocarbon receptor (AHR). AHR can interact with a variety of exogenous compounds or toxins such as dioxin, and mediate a biological response (for a review see ref. 40). The range of compounds that can activate AHR is still under investigation, and although most appear to be exogenous in origin, it has been proposed that endogenous AHR ligands may play a role in organism development or homeostasis.41 Indeed, the C. elegans ortholog of AHR, ahr-1, is required for the proper development and specification of touch-receptor neurons, interneurons, and motor neurons.42,43 8 Co-factors Regulatory TFs often activate or repress transcription, either by recruiting the RNA polymerase II machinery, or by preventing its access to the transcription start site. While many TFs interact directly with general TFs or components of RNA polymerase II, others function by interacting with intermediate proteins called co-factors (Fig. 2(E) 9 Transcription factor variants and disease TFs play a crucial role in numerous diseases, including congenital disorders and cancer. Mutations in TF-encoding genes can result in loss-of-function, gain-of-function or neomorph TFs that attain a function not shared by the original TF. One of the best-studied TFs mutated in cancer is p53, a tumor suppressor gene that is inactivated by mutation in most human cancers. Interestingly, it appears that some mutations can also convert p53 into an oncogene.48 p53 regulates the expression of various cell cycle inhibitors and proteins involved in apoptosis. It will be interesting to see how the different forms of mutant p53 are affected in their biochemical and biological functions. Several mutated TFs have been found to result in a variety of human congenital disorders. For instance, altered dimerization between the bHLH TFs Twist1 and Hand2 was found in patients with Saethre–Chotzen syndrome.49 Common neomorph TF variants that are found in instances of leukemia are fusion proteins resulting from chromosomal translocation/inversion (Fig. 2(F) 10 Conclusions Although complete genome sequences have provided a great first step toward the comprehensive identification of the compendium of TFs that function in an organism of interest, we are far from having a complete picture of all the protein variants that may exist for each predicted TF. As we discussed here, there are numerous factors that affect the functional states of TFs throughout development, homeostasis, and in disease. Since the gene count is strikingly similar between organisms of widely different complexity, a larger number of TF permutations may contribute to more intricate regulatory networks in higher eukaryotes such as humans. In the future, different TF forms need to be incorporated as individual nodes in transcription regulatory networks to facilitate network modeling and hypothesis derivation (Fig. 3
Acknowledgments We thank members of the Walhout lab and Job Dekker for discussions and critical reading of the manuscript. Research in the Walhout lab is sponsored by NIH grants DK068429 and DK071713. Biographies
References 1. Walhout AJM. Genome Res. 2006;16(12):1445–1454. [PubMed] 2. Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA. Nature. 2004;431(7004):99–104. [PubMed] 3. Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA. Cell. 2005;122(6):947–956. [PubMed] 4. Sandmann T, Jensen LJ, Jakobsen JS, Karzynski MM, Eichenlaub MP, Bork P, Furlong EE. Dev Cell. 2006;10(6):797–807. [PubMed] 5. Mukherjee S, Berger MF, Jona G, Wang XS, Muzzey D, Snyder M, Young RA, Bulyk ML. Nat Genet. 2004;36(12):1331–1339. [PubMed] 6. van Steensel B, Henikoff S. Nat Biotechnol. 2000;18(4):424–428. [PubMed] 7. Moorman C, Sun LV, Wang J, de Wit E, Talhout W, Ward LD, Greil F, Lu XJ, White KP, Bussemaker HJ, van Steensel B. Proc Natl Acad Sci USA. 2006;103(32):12027–12032. [PubMed] 8. Meng X, Brodsky MH, Wolfe SA. Nat Biotechnol. 2005;23(8):988–994. [PubMed] 9. Deplancke B, Dupuy D, Vidal M, Walhout AJM. Genome Res. 2004;14(10B):2093–2101. [PubMed] 10. Deplancke B, Mukhopadhyay A, Ao W, Elewa AM, Grove CA, Martinez NJ, Sequerra R, Doucette-Stamm L, Reece-Hoyes JS, Hope IA, Tissenbaum HA, Mango SE, Walhout AJM. Cell. 2006;125(6):1193–1205. [PubMed] 11. Vermeirssen V, Barrasa MI, Hidalgo CA, Babon JA, Sequerra R, Doucette-Stamm L, Barabási AL, Walhout AJM. Genome Res. 2007;17(7):1061–1071. [PubMed] 12. Borneman AR, Leigh-Bell JA, Yu H, Bertone P, Gerstein M, Snyder M. Genes Dev. 2006;20(4):435–448. [PubMed] 13. Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA. Curr Opin Struct Biol. 2004;14(3):283–291. [PubMed] 14. Guelzim N, Bottani S, Bourgine P, Képès F. Nat Genet. 2002;31(1):60–63. [PubMed] 15. Shen-Orr SS, Milo R, Mangan S, Alon U. Nat Genet. 2002;31(1):64–68. [PubMed] 16. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Science. 2002;298(5594):824–827. [PubMed] 17. Davidson EH, Rast JP, Oliveri P, Ransick A, Calestani C, Yuh CH, Minokawa T, Amore G, Hinman V, Arenas-Mena C, Otim O, Brown CT, Livi CB, Lee PY, Revilla R, Rust AG, Pan Z, Schilstra MJ, Clarke PJ, Arnone MI, Rowen L, Cameron RA, McClay DR, Hood L, Bolouri H. Science. 2002;295(5560):1669–1678. [PubMed] 18. Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M. Nature. 2004;431(7006):308–312. [PubMed] 19. Smith J, Theodoris C, Davidson EH. Science. 2007;318(5851):794–797. [PubMed] 20. Lemon B, Tjian R. Genes Dev. 2000;14(20):2551–2569. [PubMed] 21. Kummerfeld SK, Teichmann SA. Nucleic Acids Res. 2006;34(Database issue):D74–D81. [PubMed] 22. Messina DN, Glasscock J, Gish W, Lovett M. Genome Res. 2004;14(10B):2041–2052. [PubMed] 23. Reece-Hoyes JS, Deplancke B, Shingles J, Grove CA, Hope IA, Walhout AJM. Genome Biol. 2005;6(13):R110. [PubMed] 24. Vermeirssen V, Deplancke B, Barrasa MI, Reece-Hoyes JS, Arda HE, Grove CA, Martinez NJ, Sequerra R, Doucette-Stamm L, Brent MR, Walhout AJM. Nat Methods. 2007;4(8):659–664. [PubMed] 25. Taneri B, Snyder B, Novoradovsky A, Gaasterland T. Genome Biol. 2004;5(10):R75. [PubMed] 26. Ogg S, Paradis S, Gottlieb S, Patterson GI, Lee L, Tissenbaum HA, Ruvkun G. Nature. 1997;389(6654):994–999. [PubMed] 27. Oh SW, Mukhopadhyay A, Dixit BL, Raha T, Green MR, Tissenbaum HA. Nat Genet. 2005;38(2):251–257. [PubMed] 28. Bach I, Yaniv M. EMBO J. 1993;12(11):4229–4242. [PubMed] 29. Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD. Science. 2003;302(5653):2141–2144. [PubMed] 30. Wolberger C. Annu Rev Biophys Biomol Struct. 1999;28:29–56. [PubMed] 31. Newman JR, Keating AE. Science. 2003;300(5628):2097–2101. [PubMed] 32. Lamb P, McKnight SL. Trends Biochem Sci. 1991;16(11):417–422. [PubMed] 33. Walhout AJM, Sordella R, Lu X, Hartley JL, Temple GF, Brasch MA, Thierry-Mieg N, Vidal M. Science. 2000;287(5450):116–122. [PubMed] 34. Berk AJ. Biochim Biophys Acta. 1989;1009(2):103–109. [PubMed] 35. Tremblay A, Tremblay GB, Labrie F, Giguère V. Mol Cell. 1999;3(4):513–519. [PubMed] 36. Kim MY, Woo EM, Chong YT, Homenko DR, Kraus WL. Mol Endocrinol. 2006;20(7):1479–1493. [PubMed] 37. Dahlman-Wright K, Cavailles V, Fuqua SA, Jordan VC, Katzenellenbogen JA, Korach KS, Maggi A, Muramatsu M, Parker MG, Gustafsson JA. Pharmacol Rev. 2006;58(4):773–781. [PubMed] 38. Smith CL, O'Malley BW. Endocr Rev. 2004;25(1):45–71. [PubMed] 39. Motola DL, Cummins CL, Rottiers V, Sharma KK, Li T, Li Y, Suino-Powell K, Xu HE, Auchus RJ, Antebi A, Mangelsdorf DJ. Cell. 2006;124(6):1209–1223. [PubMed] 40. Denison MS, Nagy SR. Annu Rev Pharmacol Toxicol. 2003;43:309–334. [PubMed] 41. Denison MS, Heath-Pagliuso S. Bull Environ Contam Toxicol. 1998;61(5):557–568. [PubMed] 42. Qin H, Powell-Coffman JA. Dev Biol. 2004;270(1):64–75. [PubMed] 43. Huang X, Powell-Coffman JA, Jin Y. Development. 2004;131(4):819–828. [PubMed] 44. Roeder RG. FEBS Lett. 2005;579(4):909–915. [PubMed] 45. Rosenfeld MG, Lunyak VV, Glass CK. Genes Dev. 2006;20(11):1405–1428. [PubMed] 46. Perissi V, Rosenfeld MG. Nat Rev Mol Cell Biol. 2005;6(7):542–554. [PubMed] 47. Yu Y, Li W, Su K, Yussa M, Han W, Perrimon N, Pick L. Nature. 1997;385(6616):552–555. [PubMed] 48. Strano S, Dell'Orso S, Di Agostino S, Fontemaggi G, Sacchi A, Blandino G. Oncogene. 2007;26(15):2212–2219. [PubMed] 49. Firulli BA, Krawchuk D, Centonze VE, Vargesson N, Virshup DM, Conway SJ, Cserjesi P, Laufer E, Firulli AB. Nat Genet. 2005;37(4):373–381. [PubMed] 50. Castilla LH, Garrett L, Adya N, Orlic D, Dutra A, Anderson S, Owens J, Eckhaus M, Bodine D, Liu PP. Nat Genet. 1999;23(2):144–146. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Genome Res. 2006 Dec; 16(12):1445-54.
[Genome Res. 2006]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Dev Cell. 2006 Jun; 10(6):797-807.
[Dev Cell. 2006]Nat Genet. 2004 Dec; 36(12):1331-9.
[Nat Genet. 2004]Nat Biotechnol. 2000 Apr; 18(4):424-8.
[Nat Biotechnol. 2000]Genome Res. 2007 Jul; 17(7):1061-71.
[Genome Res. 2007]Curr Opin Struct Biol. 2004 Jun; 14(3):283-91.
[Curr Opin Struct Biol. 2004]Nat Genet. 2002 May; 31(1):60-3.
[Nat Genet. 2002]Nat Genet. 2002 May; 31(1):64-8.
[Nat Genet. 2002]Science. 2002 Oct 25; 298(5594):824-7.
[Science. 2002]Science. 2002 Mar 1; 295(5560):1669-78.
[Science. 2002]Science. 2007 Nov 2; 318(5851):794-7.
[Science. 2007]Genes Dev. 2000 Oct 15; 14(20):2551-69.
[Genes Dev. 2000]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D74-81.
[Nucleic Acids Res. 2006]Genome Res. 2004 Oct; 14(10B):2041-7.
[Genome Res. 2004]Genome Biol. 2005; 6(13):R110.
[Genome Biol. 2005]Nat Methods. 2007 Aug; 4(8):659-64.
[Nat Methods. 2007]Cell. 2006 Jun 16; 125(6):1193-205.
[Cell. 2006]Genome Res. 2007 Jul; 17(7):1061-71.
[Genome Res. 2007]Genome Biol. 2004; 5(10):R75.
[Genome Biol. 2004]Genome Biol. 2005; 6(13):R110.
[Genome Biol. 2005]Nature. 1997 Oct 30; 389(6654):994-9.
[Nature. 1997]Nat Genet. 2006 Feb; 38(2):251-7.
[Nat Genet. 2006]EMBO J. 1993 Nov; 12(11):4229-42.
[EMBO J. 1993]Annu Rev Biophys Biomol Struct. 1999; 28():29-56.
[Annu Rev Biophys Biomol Struct. 1999]Trends Biochem Sci. 1991 Nov; 16(11):417-22.
[Trends Biochem Sci. 1991]Genome Biol. 2005; 6(13):R110.
[Genome Biol. 2005]Science. 2000 Jan 7; 287(5450):116-22.
[Science. 2000]Nat Methods. 2007 Aug; 4(8):659-64.
[Nat Methods. 2007]Biochim Biophys Acta. 1989 Nov 2; 1009(2):103-9.
[Biochim Biophys Acta. 1989]Mol Cell. 1999 Apr; 3(4):513-9.
[Mol Cell. 1999]Mol Endocrinol. 2006 Jul; 20(7):1479-93.
[Mol Endocrinol. 2006]Pharmacol Rev. 2006 Dec; 58(4):773-81.
[Pharmacol Rev. 2006]Endocr Rev. 2004 Feb; 25(1):45-71.
[Endocr Rev. 2004]Genome Biol. 2005; 6(13):R110.
[Genome Biol. 2005]Cell. 2006 Mar 24; 124(6):1209-23.
[Cell. 2006]Annu Rev Pharmacol Toxicol. 2003; 43():309-34.
[Annu Rev Pharmacol Toxicol. 2003]Bull Environ Contam Toxicol. 1998 Nov; 61(5):557-68.
[Bull Environ Contam Toxicol. 1998]Dev Biol. 2004 Jun 1; 270(1):64-75.
[Dev Biol. 2004]Development. 2004 Feb; 131(4):819-28.
[Development. 2004]FEBS Lett. 2005 Feb 7; 579(4):909-15.
[FEBS Lett. 2005]Nat Rev Mol Cell Biol. 2005 Jul; 6(7):542-54.
[Nat Rev Mol Cell Biol. 2005]Genes Dev. 2006 Jun 1; 20(11):1405-28.
[Genes Dev. 2006]Nature. 1997 Feb 6; 385(6616):552-5.
[Nature. 1997]Oncogene. 2007 Apr 2; 26(15):2212-9.
[Oncogene. 2007]Nat Genet. 2005 Apr; 37(4):373-81.
[Nat Genet. 2005]Nat Genet. 1999 Oct; 23(2):144-6.
[Nat Genet. 1999]