![]() |
Formats:
|
|||||||||
Copyright © 2009 The Authors. Journal compilation © 2009 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. Journal compilation Regulation by transcription factors in bacteria: beyond description 1Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico 2Departamento de Ingeniería Genética, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Unidad Irapuato, Mexico 3Programa de Genómica Funcional de Procariotes, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico Section Editor: Victor de Lorenzo Correspondence: Julio Collado-Vides, Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Av. Universidad s/n, Col Chamilpa, 62210, Cuernavaca, Morelos, México. Tel.: +52 777 313 9877; fax: +52 777 317 5581; e-mail collado/at/ccg.unam.mx Received July 7, 2008; Revised October 16, 2008; Accepted October 17, 2008. Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation. Abstract Transcription is an essential step in gene expression and its understanding has been one of the major interests in molecular and cellular biology. By precisely tuning gene expression, transcriptional regulation determines the molecular machinery for developmental plasticity, homeostasis and adaptation. In this review, we transmit the main ideas or concepts behind regulation by transcription factors and give just enough examples to sustain these main ideas, thus avoiding a classical ennumeration of facts. We review recent concepts and developments: cis elements and trans regulatory factors, chromosome organization and structure, transcriptional regulatory networks (TRNs) and transcriptomics. We also summarize new important discoveries that will probably affect the direction of research in gene regulation: epigenetics and stochasticity in transcriptional regulation, synthetic circuits and plasticity and evolution of TRNs. Many of the new discoveries in gene regulation are not extensively tested with wetlab approaches. Consequently, we review this broad area in Inference of TRNs and Dynamical Models of TRNs. Finally, we have stepped backwards to trace the origins of these modern concepts, synthesizing their history in a timeline schema. Keywords: regulatory network inference, regulatory network plasticity, chromosome structure, dynamical models of regulatory networks, regulatory network Introduction: cis elements and trans regulatory factors Transcriptional regulation emerges from the interaction between trans factors (Latin for ‘far side of’) that bind to cis-regulatory elements (Latin for ‘this side of’) in the context of a particular chromatin/chromosome structure. Taking the doubled-stranded DNA molecule as a reference, cis elements are all those DNA regions – encoded in a plasmid or in a chromosome – in the vicinity of a gene. In complement, all the diffusible cellular molecules that are able to bind to the DNA are the trans factors. The coactivity of these molecular entities composes the minimal transcriptional regulatory system in all living organisms. In bacterial chromosomes, a transcription unit (TU) is the ordered assembly of the following genetic entities: a regulatory region, a transcription start site, one or more ORFs and a transcription termination site. When a TU comprises more than one ORF, the transcribed mRNA is called polycistronic; otherwise, it is called monocistronic. It is not uncommon for genes to be transcribed by several promoters; thus, TUs overlap. The collection of overlapping TUs constitutes an operon. Historically defined as a polycistronic TU, it has been observed that operons always contain a promoter that transcribes the whole set of genes conforming its TUs. The regulatory region contains cis elements such as the promoter – where the RNA polymerase initially binds – and transcription factor-binding sites (TFBS) – where transcription factors (TFs) bind to modulate the binding of the RNA polymerase (Browning & Busby, 2004). In prokaryotes, these regions occupy up to 400 base pairs (bp) (Collado-Vides et al., 1991). Transcription initiation in bacteria requires proteins known as sigma factors (σ). These factors – with even dozens of different types per genome – are essential for proper promoter recognition by RNA polymerase (Maeda et al., 2000; Helmann, 2002; Paget & Helmann, 2003; Kazmierczak et al., 2005). In bacteria, σ factors are divided into two main phylogenetic families: σ70 and σ54. The σ70 family includes the housekeeping σ that contributes with most of the gene transcription under normal conditions. One subgroup of factors from this family comprises a varying number of proteins known as extracytoplasmatic factors (ECF) activated in response to environmental stress. Usually, every bacterium has one protein member from the σ54 family. RNA polymerase associated with a member of this family recognizes promoters that are different from those exclusively recognized when associated with a member of σ70. However, there are exceptions where two different σ factors bind to the same promoter (Weber et al., 2005; Wade et al., 2006; Typas et al., 2007). Most σ factors have one anti-σ protein that binds to their σ cognate, inhibiting its action. The σ activity depends on σ/anti-σ ratios and the mechanisms to dissociate σ/anti-σ complexes are diverse (Hughes & Mathee, 1998). Also, there are post-translational mechanisms that modulate the activity of TFs and σ factors such as proteins of transport systems that sequester the factors, releasing them only when special conditions are encountered (Martinez-Antonio & Collado-Vides, 2008). TFs are classified in several families based on at least two domains, which allow them to function as regulatory switches (Jacob, 1970). One domain functions as a signal sensor by ligand-binding or protein–protein interaction. In many cases, the ligand is a metabolite or a physicochemical signal that conduits the endogenous or environmental information (Ptashne & Gaan, 2002; Martinez-Antonio et al., 2006). The other domain is the responsive element of the switch that directly interacts with a target DNA sequence or TFBS. In bacteria, the helix–turn–helix domain is the most common (Madan Babu & Teichmann, 2003a; Seshasayee et al., 2006). Also, in bacteria, most of these domains are present in one single protein, except for two-component systems (Ulrich et al., 2005). Classically, in these systems, when the sensor protein – usually localized in the cell periplasm – senses an exogenous condition, it phosphorylates itself and its cytoplasmic partner, which has a transcriptional regulatory activity (Mascher et al., 2006). These two-component systems work as a unit: evidences from Escherichia coli show that 26 of the 29 pairs are encoded in the same operon (Janga et al., 2007a). In general, negative regulators bind to the promoter, interfering directly with RNA polymerase; in contrast, positive regulators bind to the promoter's upstream region, helping to recruit the polymerase and start transcription (Collado-Vides et al., 1991; Madan Babu & Teichmann, 2003b). TFs usually work as homodimers, tetramers, hexamers and even, in a few cases, as heterodimers (Goulian, 2004). TFs work in concert and a regulatory region can be occupied by several TFs. One of the causes of this crowding of the DNA by TFs in some regulatory regions is the degeneracy of TF–TFBS interaction, i.e. there are different sites that are able to recruit the same TF and different TFs that can recognize similar sites. For example, overlapping regulons like E. coli's SoxS, MarA and Rob arise because of TF–TFBS degeneracy (Martin & Rosner, 2002). The regulatory effect depends on the TF concentration and TF–TFBS affinity: to function, weak sites require high concentrations of TFs; in contrast, strong sites work with a lower amount (Alon, 2007a, b). Also, compared with local TFs that tend to have high-affinity sites, global TFs are less specific, bind to a larger collection of sites and must be expressed at higher levels (Lozada-Chavez et al., 2008; Martínez-Antonio et al., 2008). Furthermore, there are TFs with a dual regulatory role, being activators and repressors at the same time. One simple example are TFs that bind to a single site in the intergenic region between divergently transcribed units, regulating each one of them in a different manner. This is a common theme in sugar catabolism loci where a structural operon is activated, whereas the gene that codes for the TF itself is repressed. An alternative process by which dual regulation works is by the interplay between TF concentration and binding site strength: imagine two TFBSs for the same TF, a weak negative site inside a promoter and a strong positive site next to it. When the TF concentration is low, the strong positive site recruits the TF and transcription is promoted. As the TF concentration increases, the strong site saturates and the weak site begins to be occupied, thus preventing the union of the polymerase to the promoter. The transcriptional regulator factor for inversion stimulation (Fis) has a dual function over some TUs using the previous strategy (Weinstein-Fischer & Altuvia, 2007). It is not yet possible to predict the regions of DNA binding from protein structure and experimental mapping is necessary. In general, the number of genes encoding TFs increases with the number of total genes. In particular, in bacterial genomes this increment is proportional to the squared number of genes, suggesting that the increase in genome size is followed by a greater regulatory complexity (Cases et al., 2003; van Nimwegen, 2003; Aravind et al., 2005; Molina & van Nimwegen, 2008). Also, genes in small genomes are relatively more clustered in operons compared with genes in larger genomes (Moreno-Hagelsieb, 2006). However, recent evidences support the idea that the average number of TFBS per regulatory region is independent of genome size (Molina & van Nimwegen, 2008). (Box 1).
Chromosome organization and structure Chromosome compactness might represent a physical constraint to transcription initiation (Willenbrock & Ussery, 2004; Marr et al., 2008). Recent studies suggest that the E. coli chromosome is arranged in structural domains with a loop-like conformation, with sizes that range from 10 to 117 kb (Postow et al., 2004; Gitai et al., 2005). The packing of some regions depends on the activity of nucleoid-associated proteins: in bacteria, these are DNA-bending [integration host factor (IHF), HU and Fis] and DNA-bridging proteins [histone-like protein (H-NS)]. The expression of these proteins depends on the growth phase, suggesting a correlation between growth and nucleoid structure (Ali Azam et al., 1999; Luijsterburg et al., 2006; Zimmerman, 2006). In addition, DNA isomerases, DNA chaperones and accessory proteins also regulate DNA access, coiling, bending and packing. Fis recognizes specific TFBSs and in some DNA regions (100–200 bp) clusters of high-affinity Fis sites can be found. However, Fis may also bind nonspecifically to stabilize DNA loops (Skoko et al., 2006). As opposed to Fis-induced bending, H–NS is a condensing agent of the DNA. However, surprisingly, some experiments have shown that it can also have the opposite relaxing effect (Dorman, 2004). It has been suggested that one of the functions of H–NS is to silence horizontally acquired genes, especially those of low GC content (Navarre et al., 2006). Chromosome size in bacteria ranges from c. 0.5 mbp (intracellular pathogens and endosymbionts) to c. 9 mbp (free-living bacteria) (Cordero & Hogeweg, 2007; Vinuelas et al., 2007). A chromosome contains from hundreds to thousands of genes that are encoded in both leading and lagging DNA strands. There is a preference for essential and highly expressed genes (such as those for ribosomal proteins) to be localized in the leading strand near the origin of replication (Rocha, 2004). The strategic orientation of these genes has been explained as an advantage for efficient transcription, for example to avoid head-on collisions between the transcription and the replication machinery (Brewer et al., 1992; Mirkin et al., 2006). The G+C content differs among genomes, although regulatory regions have a rich A+T content, an observation related to the access of the transcriptional machinery (Dekhtyar et al., 2008). Epigenetics in transcriptional regulation Inherited stable changes in cell functioning that cannot be explained as the result of mutations or modifications in the DNA sequence are considered as epigenetic (Bird, 2007). Specific molecular mechanisms are responsible for the transmission of particular acquired characteristics in a nongenetic manner: biochemical modifications in DNA or DNA-binding proteins can act as epigenetic markers. Bacterial DNA can be methylated in several ways, resulting in N4-methyl-cytosine (m4C), N6-methyl-adenine (m6A) and N5-methyl-cytosine (m5C). Among these three chemical markers, m4C has been clearly related to epigenetic transcriptional regulation besides its relation to other cellular processes (Casadesus & Low, 2006). Epigenetic markers are conserved through bacterial generations thanks to the capacity of methyltransferases to recognize preferentially hemimethylated DNA. This covalent modification can alter the interactions of restriction enzymes or regulatory proteins with DNA by a direct steric effect. In E. coli, many genes such as dnaA and trp can be regulated by Dam methyltransferase (Low et al., 2001). A well-studied specific example of epigenetic inheritance by DNA methylation is the switching of the pap operon in the uropathogenic E. coli. The operon is regulated by the interplay of two leucine-responsive protein (Lrp)-binding sites. In the repressed state, Lrp binds the proximal site interfering with transcription and Dam methylates the distal site blocking Lrp binding. The operon is derepressed when PapI dimerizes with Lrp. The PapI–Lrp complex has a higher affinity for the distal site, thus freeing the proximal site from Lrp. Dam methylates the proximal site and transcription begins (Hernday et al., 2002). Any of the two states of the pap operon is passed on to daughter cells using the methylation signal. It is not always necessary to have molecular markers for epigenetic inheritance. One commonly unnoticed – and misconceived as a trivial – example is the transmission, to the daughter cells, of the cellular components in the mother's cytoplasm in every cell division cycle. The cytoplasm contains specific factors that prime the daughter's transcription in order to recover the transcription state of the mother cell. For example, it is known that low levels of the gratuitous inducer isopropyl β-d-1-thiogalactopyranoside (IPTG) do not derepress the lac operon. However, once high IPTG concentrations have induced the transcription of the operon, it is possible to lower the IPTG concentration to noninducing levels and maintain induced a colony previously induced with high IPTG concentrations. This is because daughters of preinduced mothers have a high level of β-galactoside permease in their membranes. This allows them to import, even at low concentrations, IPTG and maintain the lac operon derepressed (Casadesus & D'Ari, 2002). Transcriptional regulatory networks The direct influence of TFs over the transcription activity of different target genes (TG) is customarily drawn in a network of causal relationships known as a transcriptional regulatory network (TRN) (McAdams & Arkin, 1998; Thieffry & Thomas, 1998; Lee et al., 2002a). The network representation unveils the global organization of transcriptional regulation such as its modular and hierarchical structure (Thieffry & Romero, 1999; Ihmels et al., 2002; Segal et al., 2003; Wolf & Arkin, 2003; Barabasi & Oltvai, 2004; Resendis-Antonio et al., 2005; Yu & Gerstein, 2006; Martínez-Antonio et al., 2008) or the fact that on average every TG is controlled by two TFs (Albert, 2005; Aldana et al., 2007). One natural unit in TRNs is the regulon: a set of TGs coregulated by the same set of TFs; this concept was originally defined as the group of genes subject to the exclusive regulation of one TF (Maas, 1964). Regulons are divided into simple or complex if regulated by a single or by multiple TFs, correspondingly. The majority of regulons in bacteria correspond to the last category (Gutierrez-Rios et al., 2003). The E. coli TRN seems to be dominated by probably <10 global TFs (Martinez-Antonio & Collado-Vides, 2003). Local TFs usually act in concert with global TFs and are also regulated by them, forming a feedforward loop motif (Alon, 2007a, b). In E. coli, most of the local TFs tend to be encoded in close chromosomal proximity with one of their regulated genes (Janga et al., 2007a). In addition to simple horizontal cotransfer, a biophysical explanation for local TFs and TGs colocalization is that, because the number of local TF molecules is low, they must be close to their regulated target in order to quickly reach their binding site by jumping and sliding along the DNA molecule (Kolesov et al., 2007; Wunderlich & Mirny, 2008). As a rule, global TFs do not regulate each other directly, a phenomenon known as ‘hubs repulsion’ or disassortativity (Song et al., 2006; Takemoto & Oosawa, 2007). As a general observation, the promiscuity of a TF for binding sites diminishes as its local character augments (Lozada-Chavez et al., 2008), and global and local regulators tend to coordinate jointly a general and a particular condition (Balaji et al., 2007; Janga et al., 2007b). Global TFs and some recently duplicated TF pairs can coregulate some TUs, forming a network motif named bifan (Shen-Orr et al., 2002). In fact, this motif is a particular class of the complex regulons coordinated by only two TFs. Escherichia coli, for instance, has regulons with as many as four to six TFs mutually affecting expression of their TGs. The transcriptional response concentrating regulatory changes – triggered by environmental signals – is partitioned by global TFs as well as by sigma promoter subsets. For example, this is evident when considering E. coli's σ interactions, giving a very clear separation of gene subsets participating coordinately in heat shock, σ32 (Nonaka et al., 2006), stress response σE (Johansen et al., 2006), and stationary-phase σS (Typas et al., 2007), etc. Local regulators and nucleoid-associated factors (many of them global TFs) affect the transcription rate of TGs in drastically distinct ways. Evidence shows that nucleoid-associated TFs and DNA-supercoiling induce continuous changes in the transcription rate, whereas local TFs induce discrete changes (i.e. On/Off transcription states). These two aspects have been compared with the analog and digital components of electronic devices (Blot et al., 2006; Marr et al., 2008). Plasticity and evolution of TRN Thanks to the availability of hundreds of sequenced bacterial genomes, one can consider the following evolutionary question: in bacteria, to what extent are TRNs conserved? Recent studies show that TFs evolve much faster than their TGs, suggesting that TRNs in bacteria are highly flexible and dynamic (Lozada-Chavez et al., 2006; Madan Babu et al., 2006). Several reports that analyze different components of TRNs strongly support their plasticity. For example, multiple evidences show that nonorthologous TFs control equivalent pathways, for example the nonorthologous NagC, NagR and NagQ regulate the utilization of N-acetylglucosamine and chitin in various groups of proteobacteria (Meibom et al., 2004; Yang et al., 2006). In contrast and to a lesser extent, orthologous regulators may control distinct pathways in different species, for example the orthologous Fur (Alpha-, Beta-, Gammaproteobacteria, bacilli and cyanobacteria) and Mur (alphaproteobacterial rhizobial species Rhizobium leguminosarum and Sinorhizobium meliloti) regulate iron homeostasis and manganese uptake, respectively (Rodionov et al., 2006). Also, even global TFs do not necessarily regulate similar metabolic responses in different organisms (Friedberg et al., 2001; Suh et al., 2002; Derouaux et al., 2004; Moreno-Campuzano et al., 2006). Likewise, as phylogenetic distances decrease, TFBS conservation increases (Makarova et al., 2001; Mazon et al., 2004). However, there are some exceptions to this rule: TFBSs of BirA (regulation of biotin biosynthesis) are highly conserved in Bacteria and Archaea (Rodionov et al., 2002), while TFBSs of ArgR/AhrC (control of arginine regulon) and NrdR (ribonucleotide reductase regulon) are strongly conserved in Bacteria (Makarova et al., 2001; Rodionov & Gelfand, 2005). This suggests that biotin, arginine and ribonucleotide reductase regulatory sites may be ancient. In addition, bacterial species that live in ever changing environments have a tendency to increase the number of encoded stress-responsive TFs and σ ECF; this may be a simple effect of a larger number of regulators encoded in larger genomes (Helmann, 2002). Finally, studies in E. coli show that some parts of its TRN are more conserved if they are involved in basic processes (Cosentino Lagomarsino et al., 2007; Salgado et al., 2007). Several evolutionary processes, such as duplication and horizontal gene transfer (HGT), must be studied to understand TRN flexibility. For example, loss and duplication of TFs and TFBS may result in regulon expansion, shrinkage, fusions, fissions and even creation and destruction. It is possible to see the contribution of gene duplication at all levels of TRNs (Teichmann & Babu, 2004), although it seems to be more frequent at the bottom layers (Cosentino Lagomarsino et al., 2007; Lozada-Chavez et al., 2008). There are coordinated TF–TG duplications in bacterial TRN. These events account for 38% of the regulatory interactions in E. coli's TRN and 45% in S. cerevisiae's TRN (Teichmann & Babu, 2004; Zhang et al., 2005). The percentages were obtained considering only paralogy within each species; this can mask a convergent evolution within paralogs. For E. coli, the previous percentage contrast with the 8% obtained when HGT events are eliminated from the regulatory interactions arose within the E. coli lineage (Price et al., 2008). Although most TFs have paralogs, they seem to have arisen by HGT rather than by gene duplication within the E. coli lineage (Price et al., 2008). Moreover, it seems that, in horizontal transfer events, local regulators flow more easily within near phylogenetic distances than global regulators (Lercher & Pal, 2008; Price et al., 2008). Therefore, global regulators are gained and lost more slowly and are even prone to undergoing a slower sequence evolution than other regulators within a bacterial lineage (Rajewsky et al., 2002; Price et al., 2008). This fact does not ensure the maintenance of their global functional role (Friedberg et al., 2001) because the property of global regulation depends on several evolutionary forces and on TF's particular molecular properties (Lozada-Chavez et al., 2008). In addition, genes recently transferred have low expression levels; probably this is a sign of slow but steady integration of transferred genes into the existing regulatory circuits (Taoka et al., 2004; Price et al., 2008). In E. coli, the evolutionary rate of TFBSs of horizontal transferred TGs is fast but gradually decelerates with the age of horizontal transfer (Lercher & Pal, 2008). These facts show that TFs and their TFBSs can evolve largely independently, allowing genes to join or leave regulons and allowing regulatory regions to increase their complexity by augmenting the quantity and type of cis-regulatory interactions. HGT, complex gene duplication events and an accelerated sequence divergence may mask the discovery of orthologs, making comparative studies of TRN a particularly difficult task; see Box 2.
The regulatory network of E. coli can be perturbed globally, rewiring it to a great extent; this might be a consequence of the inherent plasticity of TRNs. For example, Isalan et al. (2008) reconnect some global and local regulators and σ factors also by transforming wild-type strains with constructs of almost all possible combinations of these genes with their different promoters. They rewire the network in 600 different ways, every time adding up to five new interactions. Remarkably, in a wild-type genome background, bacterial colonies are viable in 95% of the cases. Another example of network perturbation in a wild-type background shows that mutations in the housekeeping σ factor induce global rewiring (Alper & Stephanopoulos, 2007). The authors show how this rewiring more efficiently solves several problems of metabolic optimization thanks to the interplay of many changes in gene expression that make possible the exploration of complex phenotypes. These results must be confronted with metabolic networks where enzymes have great specificity for their substrates and many catabolic and anabolic pathways are highly conserved. In this respect, metabolic networks appear to be stiff; in contrast, TRN seem to be loose. TFs bind to a broad spectrum of binding sites with different affinity and change targets widely among species. In the light of the previous facts, the rapid adaptation of bacterial organisms to almost every niche on earth is greatly explained thanks to the plasticity of transcriptional regulation. Stochasticity in transcriptional regulation In transcription, all the time TFs are binding to or unbinding from different sequences in the DNA. The greater the affinity, the greater the time they remain bound. If the sequence is regulatory, there is a likelihood that the rest of the transcription machinery assemblies begin transcription before the TF tears off from DNA by thermal fluctuations. In this picture, there is no natural threshold in affinity above which TFs undoubtedly induce transcription. In general, there are a variety of binding sites and for every one of them a TF will have a different affinity, inducing, with some probability, transcription. When promoters are strong and TFs abound, transcription is certain and has a well-defined rate (Elowitz et al., 2002). However, when promoter strength is weak or TF numbers oscillate around the dozens, stochastic fluctuations in the mean TF numbers are very large and transcription becomes ‘noisy’. In transcription, variability in the number of messages arises from two sources of noise: one intrinsic and the other extrinsic. In a hypothetical cell with two identical genes, intrinsic noise would cause differences in their number of transcripts. This effect is analogous to the tossing of two identical coins that do not generate the same sequence of heads and tails. Extrinsic noise originates from the cell-to-cell variation of cellular components, for example the exact number of polymerase molecules. Elowitz et al. (2002) measured the individual contribution of the two components of noise by the ingenious construct of two fluorescent proteins of different colors in the same plasmid that were subjected – every one of them separately – to the control of a promoter with the same sequence. Transformed with this construct, individuals of ‘noisy’ strains appear under the microscope with any of the two possible colors (intrinsic noise high). In quiet strains, every individual appears with the same color obtained when combining equal quantities of the two fluorescent proteins (intrinsic noise low). Extrinsic noise is obtained when comparing the fluorescence intensity among cells of the same strain. One fact with profound consequences in the cell fate decision is the metastable gene expression patterns originating from the random fluctuations of the expression of individual genes. The metastability is attained thanks to TRNs that amplify random fluctuations of gene expression and then sustain stable patterns over biological relevant lapses of time. This causes growing isogenetic colonies of microorganisms to differentiate in subcolonies of specialized ‘cell types’spontaneously (Maamar et al., 2007; Suel et al., 2007; Chai et al., 2008). Any single cell from an original isogenetic colony can give rise, in turn, to descendants that differentiate in subcolonies that are in the same proportion as the ones in the original colony. Transcriptomics At present, there are basically two options to probe the transcriptional state of the cell: microarrays and ultra-high-throughput sequencing. In the first technology, different single-stranded DNA probes are designed and arrayed to monitor the mRNA expression of different genes. These transcriptional products, isolated from a culture sample, are tagged with fluorescent proteins and then hybridized in the microarray against their complementary sequences. The intensity of the fluorescence, in the different locations of the array, gives an estimate of the abundance of the different probed transcripts. Microarray technology has been refined since its first appearance in the mid 1990s when they detected exclusively annotated ORFs (Schena et al., 1995). Today, state-of-the-art microarray technology is represented by high-density whole-genome tiling arrays. In this implementation, the arrayed set of probes is richer, containing, for example, DNA probes for both intragenic and intergeneic regions. This improvement allows for the identification of complex transcript structures – such as genes in operons – as well as novel short transcripts – such as small RNAs – that would be missed by previous low-density arrays (Reppas et al., 2006). The raw data generated from microarrays must be transformed in two steps: correction for background noise and normalization. The first transformation attempts to eliminate the contribution from unspecific hybridization; the second transformation intends to make gene intensities from different experiments comparable (Quackenbush, 2002). The widespread use of this technology has led to the appearance of useful databases with collections of hundreds of arrays of different bacterial organisms under diverse experimental conditions (Demeter et al., 2007; Faith et al., 2008; Kanehisa et al., 2008). There are particular problems that are inherent to microarray technology. For example, prior selection of probes in the arrays biases the possible set of transcripts that can be detected; unspecific hybridization cannot be completely eliminated; the differential efficiency of probes makes it impossible to compare the expression of different genes in the same sample, etc. It appears that the solution to these problems is to use the sheer brute force of massive sequencing with the new ultra-high-throughput sequencing technologies (Bennett et al., 2005; Margulies et al., 2005). The idea is simple: sequence all the transcripts that the cell expresses under a particular condition and then map these sequences back to their corresponding regions on the genome to detect presence or absence (Nagalakshmi et al., 2008). Note that the detection of transcripts is not conditioned on a possibly biased set of probes nor on the resolution of the array. This translates into the possible discovery of new gene products. Also, the effect of unspecific hybridization is not present in the sequencing, and comparison between gene transcript levels is possible because the number of sequenced transcripts is directly counted. At least one study has compared microarray and sequencing technology, showing that data in the latter are highly replicable and that the sequencing technology can detect differentially expressed genes between two samples at a higher positive discovery rate (Marioni et al., 2008). The processing of transcription data and the rationale behind that same processing is as important as the technology to probe transcription. The traditional data workflow screens for differentially expressed genes; this proceeding has been described, pejoratively, as fishing expeditions (Gibson, 2003). This criticism indirectly points to the fact that the community lacks methods to synthesize gene expression data and methods to analyze this synthesis at higher levels of description, for example gene expression data organized coherently in TRN or genes of related function sorted out in functional classes. One way to amend this situation is the use of a clustering method known as Self-Organizing Maps. This clustering reorganizes transcription data in such a way that genes with similar expression levels are contiguously located in a squared lattice, generating an image of the state of the transcriptome. Surprisingly, with this reordering, it is possible to sort out different cellular functional states just by seeing the image, a gestalt analysis (Guo et al., 2006). Another method of higher level analysis is to take advantage of the decades of molecular knowledge and organize transcriptional data into sets of genes that together perform a cellular process (Subramanian et al., 2005). This gene set analysis has a higher statistical power to discriminate changes at the gene set level that would be unnoticed at the single-gene level. (Box 3).
Synthetic transcriptional regulatory circuits The previous sections show the detailed knowledge we have accumulated on transcriptional regulation by TFs. The synthesis of TRNs attempts to go from this understanding to a rational transcriptional network design. It aspires to integrate new complex functions into cell behavior; not just the addition of stationary properties such as the constitutive expression of exogenous proteins but the addition of the dynamically controlled expression of complete gene programs. There are several first examples in this direction that show the feasibility of program integration into cell behavior: rational design of memory circuits (Ajo-Franklin et al., 2007), insertion of complete regulated metabolic pathways (Pfleger et al., 2006), toggle switches (Gardner et al., 2000), oscillators (Elowitz & Leibler, 2000) and the creation of new ways of cell–cell communication (Bulter et al., 2004). In all these cases, small gene circuits compute their output based on the external/internal input signal sensed. Promoters controlling the expression of the genes in the circuit are the essential piece to accomplish the required computations, for it is in this element where signals – transduced by TFs – converge and are integrated. The particular importance of promoters has naturally led to an interest in their characterization and synthesis. For example, with respect to their characterization, it has been shown that, more often than not, the activity of different promoters controlled by two regulators is not a simple OR/AND function (Cox et al., 2007; Kaplan et al., 2008). With regard to their synthesis, now we have available complete characterized libraries of synthetic promoters with different strengths; this last fact was verified indirectly by measuring the specific β-galactosidase activity. Remarkably, more than six orders of magnitude in β-galactosidase activity can be covered using different promoters (Mijakovic et al., 2005). It is also possible to create libraries of regulated promoters by combinatorial synthesis (Cox et al., 2007). This consists of the combinatorial ligation of previously created promoter regions, i.e. sequences that correspond to the distal region (upstream the −35 box), to the core region (between the −35 and −10 boxes) and to the proximal region (downstream the −10 box). These regions contain one operator site for any of the following regulators: LacI, AraC, LuxR or TetR. Using this strategy, thousands of promoters with different regulated strengths can be generated. Supposing complete characterized libraries of different promoters exist, the main challenge in synthetic circuits still remains: to integrate these small networks into the cell environment without killing the cell, for example without overproducing toxic intermediates or causing metabolic bottlenecks that would inhibit growth. The problem is to find the exact promoter strengths with the correct regulatory region to balance and coordinate the expression of multiple genes. One promising solution is to generate a library of networks and then select the best-performing one under a given criterion. This is the same strategy followed in the directed evolution of proteins, where a library of mutant protein sequences is created and then screened for the best variation of the protein. The difference in the library of networks lies in the fact that the mutations are in the noncoding regions that regulate transcription and translation. One example of this approach is the combinatorial synthesis of intergenic regions in operons to tune the translation of polycistronic transcripts (Pfleger et al., 2006). The approach, without a specific design, generates transcripts with slight variations in intergenic regions that change RNAase cleavage sites, ribosomal binding sites sequestering sequences and mRNA secondary structures. With this technique, it was possible to introduce in E. coli a heterologous mevalonate biosynthetic pathway by tuning the expression of three genes in an operon. In one last example of combinatorial synthesis, a collection of 125 different networks was produced from these units: five different promoters regulated by three different TFs (LacI, TetR and λ cI) (Guet et al., 2002). Among the networks, it is possible to find positive and negative feedback loops, oscillators and toggle switches. It must be stressed that all these different network functions can be encoded with the same set of genes, the difference residing only in the interaction graph of the constituent genes. Concluding remarks: the need for integrative schemes Even though recent progress to unravel the underlying mechanisms of transcriptional regulation has been spectacular, the community lacks an integrative framework to direct new advances. In this respect, systems biology in bacteria has the challenge to show its promised capabilities of new levels of integration and understanding combining modeling and experiments of the whole network and cell behavior. To achieve this, there are two complementary procedures: bottom-up and top-down schemes (Beer & Tavazoie, 2004; Bonneau et al., 2007). The former traces its origins to the systems sciences, whose essence is to explore the collective phenomena emerging when integrating its building parts (Bruggeman & Westerhoff, 2007). Bottom-up schemes constitute the base to develop mechanistic models that are useful to discern the transcriptional organization by which the cell faces a genetic or an environmental perturbation at a genome scale (Segre et al., 2002; Covert et al., 2004; Resendis-Antonio et al., 2007). On the other hand, top-down procedures require deductive methods, whose main interest is to identify causal interactions between the individual genes measured by high-throughput technologies (Wagner, 2001; de la Fuente et al., 2002). Successful integration of top-down and bottom-up schemes is not a trivial activity; it requires permanent comparison between types of modeling and its experimental verification to reconstruct a coherent explanation of cell activity. The navigation towards progress here depends on how simplified models can capture the essentials to predict, and the fact that biological systems can be engineered in synthetic approaches, even if they are also extremely interconnected. This review has focused on regulation by TFs. However, there are other layers of cellular regulation that ultimately influence regulation by TFs. This situation creates feedback loops that transmit information from almost any regulatory layer to any other one in order to maintain cellular homeostasis. This is in clear contrast with the isolated picture of TRN, where a cellular hierarchical decision-making structure is emphasized. Thus, a major conceptual challenge is to change our way of thinking about causality in a complex system with an important connectivity and an important amount of circularity, i.e. feedback loops, in the ‘decision’ network of gene regulation at the whole-cell level. Acknowledgments We thank S. Gama-Castro for useful comments. We also thank anonymous referee no. 2 for detailed observations. Y.I.B.-M. acknowledges PDCB-UNAM and CONACyT-México for a PhD scholarship; E.B. also thanks CCG-UNAM and CIC-UNAM for a postdoctoral scholarship. We acknowledge support by NIH grant no. R01 GM071962-05 and PAPPIT IN214905. Statement Reuse of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation. Authors' contribution L.N.L.-B., A.M.-A., O.R.-A. and I.L.-C. contributed equally to this article. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
||||||||
Nat Rev Microbiol. 2004 Jan; 2(1):57-65.
[Nat Rev Microbiol. 2004]Microbiol Rev. 1991 Sep; 55(3):371-94.
[Microbiol Rev. 1991]Nucleic Acids Res. 2000 Sep 15; 28(18):3497-503.
[Nucleic Acids Res. 2000]Adv Microb Physiol. 2002; 46():47-110.
[Adv Microb Physiol. 2002]Genome Biol. 2003; 4(1):203.
[Genome Biol. 2003]J Bacteriol. 2005 Mar; 187(5):1591-603.
[J Bacteriol. 2005]Nat Struct Mol Biol. 2006 Sep; 13(9):806-14.
[Nat Struct Mol Biol. 2006]Trends Microbiol. 2006 Jan; 14(1):22-7.
[Trends Microbiol. 2006]Nucleic Acids Res. 2003 Feb 15; 31(4):1234-44.
[Nucleic Acids Res. 2003]Curr Opin Microbiol. 2006 Oct; 9(5):511-9.
[Curr Opin Microbiol. 2006]Trends Microbiol. 2005 Feb; 13(2):52-6.
[Trends Microbiol. 2005]J Mol Biol. 2007 Apr 20; 368(1):263-72.
[J Mol Biol. 2007]Microbiol Rev. 1991 Sep; 55(3):371-94.
[Microbiol Rev. 1991]Trends Genet. 2003 Feb; 19(2):75-9.
[Trends Genet. 2003]Curr Opin Microbiol. 2004 Apr; 7(2):198-202.
[Curr Opin Microbiol. 2004]Mol Microbiol. 2002 Jun; 44(6):1611-24.
[Mol Microbiol. 2002]Nat Rev Genet. 2007 Jun; 8(6):450-61.
[Nat Rev Genet. 2007]Trends Microbiol. 2003 Jun; 11(6):248-53.
[Trends Microbiol. 2003]Trends Genet. 2003 Sep; 19(9):479-84.
[Trends Genet. 2003]FEMS Microbiol Rev. 2005 Apr; 29(2):231-62.
[FEMS Microbiol Rev. 2005]Genome Res. 2008 Jan; 18(1):148-60.
[Genome Res. 2008]J Mol Biol. 1961 Jun; 3():318-56.
[J Mol Biol. 1961]J Mol Biol. 1965 Jan; 11():90-6.
[J Mol Biol. 1965]J Theor Biol. 1969 Mar; 22(3):437-67.
[J Theor Biol. 1969]J Biol Chem. 1959 Sep; 234():2351-8.
[J Biol Chem. 1959]Nature. 1969 Jan 4; 221(5175):43-6.
[Nature. 1969]Genome Biol. 2004; 5(12):252.
[Genome Biol. 2004]BMC Syst Biol. 2008 Feb 19; 2():18.
[BMC Syst Biol. 2008]Genes Dev. 2004 Jul 15; 18(14):1766-79.
[Genes Dev. 2004]Trends Microbiol. 2005 May; 13(5):221-8.
[Trends Microbiol. 2005]J Bacteriol. 1999 Oct; 181(20):6361-70.
[J Bacteriol. 1999]Trends Genet. 2007 Oct; 23(10):488-93.
[Trends Genet. 2007]BMC Genomics. 2007 Jun 4; 8():143.
[BMC Genomics. 2007]Microbiology. 2004 Jun; 150(Pt 6):1609-27.
[Microbiology. 2004]Cell. 1992 Oct 16; 71(2):267-76.
[Cell. 1992]BMC Bioinformatics. 2008 May 9; 9():233.
[BMC Bioinformatics. 2008]Nature. 2007 May 24; 447(7143):396-8.
[Nature. 2007]Infect Immun. 2001 Dec; 69(12):7197-204.
[Infect Immun. 2001]Bioessays. 2002 Jun; 24(6):512-8.
[Bioessays. 2002]Pac Symp Biocomput. 1998; ():77-88.
[Pac Symp Biocomput. 1998]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Biosystems. 1999 Apr; 50(1):49-59.
[Biosystems. 1999]Nat Genet. 2002 Aug; 31(4):370-7.
[Nat Genet. 2002]Nat Genet. 2003 Jun; 34(2):166-76.
[Nat Genet. 2003]Curr Opin Microbiol. 2003 Oct; 6(5):482-9.
[Curr Opin Microbiol. 2003]Nat Rev Genet. 2007 Jun; 8(6):450-61.
[Nat Rev Genet. 2007]J Mol Biol. 2007 Apr 20; 368(1):263-72.
[J Mol Biol. 2007]Nucleic Acids Res. 2008 Jun; 36(11):3570-8.
[Nucleic Acids Res. 2008]Math Biosci. 2007 Aug; 208(2):454-68.
[Math Biosci. 2007]EMBO Rep. 2006 Jul; 7(7):710-5.
[EMBO Rep. 2006]BMC Syst Biol. 2008 Feb 19; 2():18.
[BMC Syst Biol. 2008]Nucleic Acids Res. 2006; 34(12):3434-45.
[Nucleic Acids Res. 2006]J Mol Biol. 2006 Apr 28; 358(2):614-33.
[J Mol Biol. 2006]J Biol Chem. 2006 Oct 6; 281(40):29872-85.
[J Biol Chem. 2006]PLoS Comput Biol. 2006 Dec 15; 2(12):e163.
[PLoS Comput Biol. 2006]J Bacteriol. 2001 Jul; 183(13):4004-11.
[J Bacteriol. 2001]Nat Genet. 2004 May; 36(5):492-6.
[Nat Genet. 2004]J Mol Biol. 2008 Jun 6; 379(3):627-43.
[J Mol Biol. 2008]J Biol. 2005; 4(2):6.
[J Biol. 2005]Genome Biol. 2008 Jan 7; 9(1):R4.
[Genome Biol. 2008]Genome Biol. 2008 Jan 7; 9(1):R4.
[Genome Biol. 2008]Mol Biol Evol. 2008 Mar; 25(3):559-67.
[Mol Biol Evol. 2008]Genome Res. 2002 Feb; 12(2):298-308.
[Genome Res. 2002]J Bacteriol. 2001 Jul; 183(13):4004-11.
[J Bacteriol. 2001]J Mol Biol. 2008 Jun 6; 379(3):627-43.
[J Mol Biol. 2008]Bioinformatics. 2005 May 15; 21(10):2563-5.
[Bioinformatics. 2005]Bioinformatics. 2005 Nov 15; 21(22):4187-9.
[Bioinformatics. 2005]BMC Genomics. 2006 Feb 14; 7():24.
[BMC Genomics. 2006]Nucleic Acids Res. 2007 Jan; 35(Database issue):D407-12.
[Nucleic Acids Res. 2007]Nucleic Acids Res. 2008 Jan; 36(Database issue):D120-4.
[Nucleic Acids Res. 2008]Science. 2007 Jun 8; 316(5830):1441-2.
[Science. 2007]Genes Dev. 2005 Nov 1; 19(21):2619-30.
[Genes Dev. 2005]Genome Res. 2008 Jun; 18(6):900-10.
[Genome Res. 2008]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]BMC Bioinformatics. 2006 Mar 7; 7():113.
[BMC Bioinformatics. 2006]Plant Cell. 2007 Nov; 19(11):3327-38.
[Plant Cell. 2007]Ann N Y Acad Sci. 2007 Dec; 1115():51-72.
[Ann N Y Acad Sci. 2007]Nat Genet. 2003 Jun; 34(2):166-76.
[Nat Genet. 2003]Bioinformatics. 2002; 18 Suppl 2():S231-40.
[Bioinformatics. 2002]PLoS Biol. 2007 Jan; 5(1):e8.
[PLoS Biol. 2007]Genome Res. 2002 Oct; 12(10):1523-32.
[Genome Res. 2002]Nucleic Acids Res. 2003 Mar 15; 31(6):1753-64.
[Nucleic Acids Res. 2003]Nat Biotechnol. 2005 Jan; 23(1):137-44.
[Nat Biotechnol. 2005]Genome Res. 2006 Mar; 16(3):405-13.
[Genome Res. 2006]Chem Rev. 2007 Aug; 107(8):3467-97.
[Chem Rev. 2007]Genome Res. 2004 Jun; 14(6):1107-18.
[Genome Res. 2004]Nucleic Acids Res. 2006; 34(12):3434-45.
[Nucleic Acids Res. 2006]J Mol Biol. 2006 Apr 28; 358(2):614-33.
[J Mol Biol. 2006]PLoS Comput Biol. 2007 Sep; 3(9):1739-50.
[PLoS Comput Biol. 2007]BMC Microbiol. 2008 Apr 11; 8():60.
[BMC Microbiol. 2008]Bioinformatics. 2006 Mar 15; 22(6):645-50.
[Bioinformatics. 2006]Nature. 2008 Apr 17; 452(7189):840-5.
[Nature. 2008]Metab Eng. 2007 May; 9(3):258-67.
[Metab Eng. 2007]Science. 2002 Aug 16; 297(5584):1183-6.
[Science. 2002]Science. 2002 Aug 16; 297(5584):1183-6.
[Science. 2002]Science. 2007 Jul 27; 317(5837):526-9.
[Science. 2007]Science. 2007 Mar 23; 315(5819):1716-9.
[Science. 2007]Mol Microbiol. 2008 Jan; 67(2):254-63.
[Mol Microbiol. 2008]Science. 1995 Oct 20; 270(5235):467-70.
[Science. 1995]Mol Cell. 2006 Dec 8; 24(5):747-57.
[Mol Cell. 2006]Nat Genet. 2002 Dec; 32 Suppl():496-501.
[Nat Genet. 2002]Nucleic Acids Res. 2007 Jan; 35(Database issue):D766-70.
[Nucleic Acids Res. 2007]Nucleic Acids Res. 2008 Jan; 36(Database issue):D866-70.
[Nucleic Acids Res. 2008]Pharmacogenomics. 2005 Jun; 6(4):373-82.
[Pharmacogenomics. 2005]Nature. 2005 Sep 15; 437(7057):376-80.
[Nature. 2005]Science. 2008 Jun 6; 320(5881):1344-9.
[Science. 2008]Genome Res. 2008 Sep; 18(9):1509-17.
[Genome Res. 2008]PLoS Biol. 2003 Oct; 1(1):E15.
[PLoS Biol. 2003]J Biomed Biotechnol. 2006; 2006(5):69141.
[J Biomed Biotechnol. 2006]IET Syst Biol. 2007 Mar; 1(2):61-77.
[IET Syst Biol. 2007]J Theor Biol. 1969 Mar; 22(3):437-67.
[J Theor Biol. 1969]J Theor Biol. 1998 Jul 27; 193(2):307-19.
[J Theor Biol. 1998]J Theor Biol. 2003 Jul 7; 223(1):1-18.
[J Theor Biol. 2003]Breast Dis. 2006-2007; 26():27-54.
[Breast Dis. 2006]J Theor Biol. 2007 Apr 7; 245(3):433-48.
[J Theor Biol. 2007]J Biol Chem. 2001 Mar 16; 276(11):8165-72.
[J Biol Chem. 2001]Nature. 2004 Jan 29; 427(6973):415-8.
[Nature. 2004]Nature. 2006 Mar 23; 440(7083):545-50.
[Nature. 2006]Nature. 2005 Sep 29; 437(7059):699-706.
[Nature. 2005]Nature. 2000 Jun 1; 405(6786):520-1.
[Nature. 2000]J Comput Biol. 2002; 9(1):67-103.
[J Comput Biol. 2002]Cell. 2003 May 30; 113(5):597-607.
[Cell. 2003]Annu Rev Biomed Eng. 2003; 5():179-206.
[Annu Rev Biomed Eng. 2003]Nat Rev Genet. 2007 Jun; 8(6):450-61.
[Nat Rev Genet. 2007]J Mol Biol. 1985 Jan 20; 181(2):211-30.
[J Mol Biol. 1985]Biophys J. 2004 Mar; 86(3):1357-72.
[Biophys J. 2004]Nat Chem Biol. 2006 Feb; 2(2):87-94.
[Nat Chem Biol. 2006]Nat Biotechnol. 2006 Oct; 24(10):1235-40.
[Nat Biotechnol. 2006]Bioinformatics. 2008 Sep 15; 24(18):2044-50.
[Bioinformatics. 2008]J Theor Biol. 1973 Apr; 39(1):103-29.
[J Theor Biol. 1973]Genes Dev. 2007 Sep 15; 21(18):2271-6.
[Genes Dev. 2007]Nat Biotechnol. 2006 Aug; 24(8):1027-32.
[Nat Biotechnol. 2006]Nature. 2000 Jan 20; 403(6767):339-42.
[Nature. 2000]Nature. 2000 Jan 20; 403(6767):335-8.
[Nature. 2000]Mol Syst Biol. 2007; 3():145.
[Mol Syst Biol. 2007]Nat Biotechnol. 2006 Aug; 24(8):1027-32.
[Nat Biotechnol. 2006]Science. 2002 May 24; 296(5572):1466-70.
[Science. 2002]Cell. 2004 Apr 16; 117(2):185-98.
[Cell. 2004]Cell. 2007 Dec 28; 131(7):1354-65.
[Cell. 2007]Trends Microbiol. 2007 Jan; 15(1):45-50.
[Trends Microbiol. 2007]Nature. 2004 May 6; 429(6987):92-6.
[Nature. 2004]PLoS Comput Biol. 2007 Oct; 3(10):1887-95.
[PLoS Comput Biol. 2007]