• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Mol Biol. Author manuscript; available in PMC Jun 6, 2009.
Published in final edited form as:
PMCID: PMC2491489

The role of DNA-binding specificity in the evolution of bacterial regulatory networks


Understanding the mechanisms by which transcriptional regulatory networks (TRNs) change through evolution is a fundamental problem. Here we analyze this question using data from Escherichia coli and Bacillus subtilis, finding that paralogy relationships are insufficient to explain the global or local role observed for transcription factors (TFs) within regulatory networks. Our results provide a picture in which DNA-binding specificity, a molecular property that can be measured in different ways, is a predictor of the role of transcription factors. In particular, we observe that global regulators consistently display low binding specificities, while displaying comparatively higher expression values in microarray experiments. In addition, in this work we find a strong negative correlation between binding specificity and the number of co-regulators which help coordinate genetic expression at a genomic scale. A close look at several orthologous TFs, including FNR, a regulator found to be global in E. coli and local in B. subtilis, confirms the diagnostic value of specificity in order to understand their regulatory function, and also highlights the importance of evaluating the metabolic and ecological relevance of effectors as another variable in the evolutionary equation of regulatory networks. Finally, a general model is presented that integrates some evolutionary forces and molecular properties, aiming to explain how regulons grow and shrink, as bacteria tune their regulation to increase adaptation.

Keywords: transcription, regulatory network, binding specificity, global regulator, paralogy


The expression of genes can be controlled by transcriptional regulatory mechanisms in response to cellular stimuli. Transcriptional regulation in prokaryotes depends generally upon the recognition of specific DNA operator sites (bsDNA) by transcription factors (TFs). These protein-DNA interactions affect the synthesis of messenger RNA molecules of target genes (TG), which can be activated or repressed. Overall, the set of transcriptional regulatory interactions in a given organism is often called Transcriptional Regulatory Network (TRN). Genomic and statistical analysis of TRNs has shown that transcriptional proteins have a differential connectivity, in which a small set of TFs regulates a much larger set of TGs 1; 2; 3. Even though different criteria have been proposed to define the property of connectivity 4, it is possible to assign TFs one of two functional roles, being either local or global regulators. On the basis of the number of TGs that a TF might regulate and additional features such as the different sigma-classes of promoters, the number of co-regulators and the number of conditions, highly connected TFs are called global regulators. In contrast, a large proportion of TFs in a network affects the expression of only one or few genes. These are called local regulators 5; 6.

It is thought that genetic duplication might be the main evolutionary mechanism rewiring transcriptional networks7, and could also explain the origin of global and local regulators. In particular, Teichmann and Babu 8 have proposed that TRNs evolve by duplication of TFs and TGs which might conserve their regulation or rather gain new regulatory interactions. Genetic duplication indeed accounts for 52% of the TRN in E. coli 8. However, Cosentino and coworkers 9 have concluded that the contribution of this mechanism to the network architecture is maximum within local regulators and TGs and otherwise minimal when global TFs are considered. Besides, although duplication events have been recognized in many different species, TRNs are poorly conserved across bacterial species 10; 11 not only because global regulators do not necessarily share similar evolutionary histories, but also because they do not necessarily regulate similar metabolic responses in different organisms 3; 12; 13; 14; 15; 16; 17. Therefore, we find that there are still important questions to be answered regarding the evolution of regulatory networks. Here we take the two best annotated prokaryotic transcriptional networks, the gram-negative Escherichia coli K12 18 and the gram-positive Bacillus subtilis 19, with remarkably different niches 20 and evolutionary histories 21; 22, in order to address this subject. This work re-evaluates the contribution of genetic duplication, by asking how it is that paralogous TFs acquire different roles in regulatory networks. More explicitly, we aim at identifying distinctive properties required for TFs to evolve as global or local regulators. Firstly, we take the collection of TFs from E. coli and B. subtilis in order to estimate their specificity, defined as the ability to discriminate binding sites along DNA molecules. The results obtained demonstrate that binding specificity is strongly correlated with the hierarchical role of TFs within regulatory networks, with global regulators consistently displaying low specificity (LS), while local regulators show high specificity (HS), as already anticipated by different groups. This observation suggests that the ability of TFs to conserve or gain new TGs might depend on this biochemical property. In addition, this work finds that regulatory proteins with low specificity show higher expression values in microarray experiments, perhaps as expected, since they bind to more DNA sites. Furthermore, we find that the degree of co-regulation by more than one TF in E. coli is negatively correlated with the specificity of DNA-binding, and we discuss several biological processes that might explain this observation. To examine our findings, we compare orthologous TFs for which sets of experimentally verified bsDNAs are available in both bacteria, with detailed insight into the FNR (fumarate and nitrate reduction) regulatory protein, confirming that the calculated specificity values are in agreement with their global or local roles. Finally, a general model is presented that summarizes some mechanisms that affect how regulons grow and shrink; in other words, how TFs might gain or lose regulatory interactions as bacteria tune their regulatory networks in order to better respond to their environmental and metabolic requirements. While this paper presents evidence about the importance of binding specificity and co-regulation, the model also includes two variables that must be involved in this evolutionary process: the rate of genomic mutations and the effectors sensed by bacterial TFs.


Contribution of genetic duplication to the evolution of transcriptional networks

There is compelling evidence suggesting that gene duplication is a major force explaining the growth of TRNs8 and it is also expected that this process will affect the connectivity distribution of these networks 23; 24, as has been seen in other biological networks. Here, we evaluate this hypothesis using data from E. coli and B. subtilis by asking whether there is any coupling between the occurrence of TF duplication events and the role of transcription factors within regulatory networks. To accomplish this goal it was firstly necessary to classify TFs in terms of paralogy. As explained in Materials and Methods, in E. coli we predicted 24 groups of complete paralogs from a set of 85 TFs for which, experimentally characterized bsDNAs are available. In B. subtilis we found 25 paralogous groups out of 91 TFs. In both cases there were a few TFs labeled as singletons, since no duplication evidence was found for them (15 in E. coli and 26 in B. subtilis).

Figure 1 tells that duplication events have occurred at all levels of TRNs, although they seem to be more frequent towards the low connectivity end of the regulatory hierarchy. This means that most TF duplication events have resulted in adding nodes to the base of the network, in agreement with recent observations 9. Furthermore, this figure shows that most global regulators belong to different paralogous groups in the two species subject of this study. With the exception of CRP and FNR in E. coli, most global regulators have paralogs in the network, which in contrast have local regulatory roles. For instance, ArcA has eight related known TFs in the E. coli network, all of them thought to be local regulators. In B. subtilis, CcpA is another remarkable example, with five other known regulatory proteins supposed to be paralogously related. It is important to note that this methodology relies entirely on finding paralogous TFs and cannot separate duplication events from possible horizontal transfer events.

Figure 1Figure 1
Paralogous groups of transcriptions factors in the TRNs from E. coli (a) and B. subtilis (b)

From these results it can be stated that identifying paralogy relationships neither helps understanding the role of TFs nor does it explain how network nodes become regulatory targets of previously existing TFs. In other words, we still need to know which distinctive properties of TFs make them more or less likely to gain or lose regulatory interactions, which is something known to be happening in evolution 25; 26. For this reason we focused on TF binding specificity, defined here as the ability of DNA-binding proteins to discriminate a small subset of DNA sequences from the vast repertoire of sequences found in a genome. There are different ways of approximating the specificity of DNA-binding proteins (see for instance 27; 28; 29). As explained in next section, we tried different measures and obtained compatible results with all of them.

Specificity estimated through the observed diversity of DNA binding sites

A natural way of estimating the specificity of TFs is shown in Figure 2, provided that collections of binding sites are available. The actual property measured is the unadjusted information content (UIC) of sequence motifs, which is known to be a valid estimate of the relative specificity of DNA-binding proteins30, commonly calculated for sequence logo representations of binding motifs. Both scatter plots show that the information content of sequence motifs is strongly correlated with the number of sites recognized by each TF. In other words, translating information content to specificity, proteins able to recognize many DNA sites show lower specificity than local regulators, which present high specificity. This result agrees with previous observations made by Sengupta and collaborators in E. coli29. Since some TFs bind only to one or two sites, and others to more than a hundred different genomic positions, this variable was log-transformed for convenience. In addition, as sequence motifs have different widths, the information content in these figures was normalized by dividing the raw IC by the motif width, as explained in Material and Methods. The correlation coefficient obtained for the E. coli data was −0.81 (pairs=67, R2=0.66, p<10E-16); for B. subtilis we also found a significant correlation coefficient of −0.81 (pairs=70, R2=0.66, p<10E-17). The results obtained with these two species, only remotely related with each other, suggest that this functional correlation between binding specificity and regulon size might be found in other bacterial species. However, other variables might be affecting the interpretation of these results as discussed in the following paragraph.

Figure 2Figure 2
Scatter plot of normalized information content versus number of binding sites in E. coli (a) and B. subtilis (b)

For instance, the catalogue of TF binding sites is probably incomplete for most TFs and biased towards regulatory proteins that play a role in physiological conditions that are more easily reproduced in experimental labs. How would this affect the analysis? We approached this question by randomly sampling the collection of available sites in both model organisms. The idea was to repeat the analysis in Figure 2 after 100 rounds of resampling using only 30% of the reported sites for each TF. Of course this could only be done for TFs with at least 7 sites, but the resulting correlation coefficients are very similar in both species: −0.86 in E. coli (pairs=40, R2=0.74, p<10E-12) and −0.89 in B. subtilis (pairs=34, R2=0.79, p<10E-11). While this experiment shows that the number of available bsDNAs does not change the previously observed correlation between regulon size and TF specificity, it also proves that the actual IC measurements (i.e. specificities) may change depending on the collection of sites we have at hand. As an illustration, inspecting the data in Figure 2 we may conclude that DnaA has an IC of 0.71 in E. coli. However, if we take the mean IC after 100 random samples (Table 1) we might say that the specificity of DnaA is actually 1.12. If we must take these IC measurements as absolute values, then probably it is wiser to take the values compiled after sampling. Table 1 shows the specificity estimates in Figure 2 next to the mean IC after sampling.

Table 1
Normalized information content (specificity) of transcription factors in B. subtilis and E. coli with 7+ reported binding sites

The next variable considered was the geometry of the binding sites. Since TFs can bind to DNA in different ways –i.e as monomers or dimers, with or without spacers-, only the 10 most informative columns in each motif were taken in order to calculate the IC, ensuring a fair comparison of motifs. This approach would also compensate for potential errors in the annotation of motif widths. The analysis on the E. coli dataset yields a correlation coefficient of −0.82 (pairs=63, R2=0.67, p<10E-15). The picture is similar when using B. subtilis data, with a correlation coefficient of −0.79 (pairs=27, R2=0.63, p<10E-6). Again, a very significant correlation was found, reinforcing the initial observations.

Finally, we tried to estimate binding specificity using exactly two sites for each TF: the best and the worst sites when aligned to the corresponding sequence motif, in the form of a position-weight matrix. Here, the idea was to approximate the variability of sites recognized by any TF, expecting that highly specific proteins would bind to sites with similar scores, while LS regulators would recognize a broad range of sites. Thus, we calculated the PWM score variability for every TF finding once again significant correlations in both bacterial species with respect to the number of binding sites. In B. subtilis we find a correlation coefficient of 0.74 (pairs=46, R2=0.54, p<10E−8), compared to a coefficient of 0.91 (pairs=55, R2=0.83, p=0) in E. coli. It is important to note that the same picture holds when coefficients of variation, less sensitive to outliers, are calculated for each TF.

Diversity of DNA binding structural potentials as a measure of binding specificity

A rather different method for estimating binding specificity is shown in Figure 3, where the crystallographic structures of 11 E. coli protein-DNA complexes were used to thread the collection of RegulonDB binding sites for each of them. This collection includes TrpR, Rob, PurR, PhoB, NarL, MetJ, MarA, FadR, DnaA, CRP and LacR. As explained in Materials and Methods, each sequence was scored in terms of an estimate of the structural binding potential, and the observed score diversity plotted against the number of recognized binding sites. Despite the small number of complexes available, we observe a correlation coefficient of 0.92 (pairs=11, R2=0.85, p=0.0004) between connectivity and the observed energy variability, supporting the hypothesis that global regulators are able to bind a larger collection of sites, at the cost of being less specific. These results provide new insights into the molecular recognition of DNA binding sites, suggesting that the array of interface contacts between protein and DNA counterparts, as captured in crystallographic complexes, can be utilized in order to estimate the specificity of TFs. Unfortunately, we cannot perform this analysis on B. subtilis due to the lack of structural data.

Figure 3
Scatter plot of binding energy variability versus log (number of binding sites), obtained from 11 E. coli TF-DNA complexes

Contact-based estimations of binding specificity

Inspired by a previous work by Luscombe 31, we attempted to classify TFs according to their ratio of specific to non-specific protein-DNA contacts. A key difference in this approach is that no binding site knowledge is used. Instead, a large collection of protein-DNA complexes is required in order to build comparative models of TFs, which are then used to identify amino acid residues that are likely to contact nitrogen bases at the interface (specific contacts), as opposed to non-specific contacts, that usually include phosphate and sugar atoms. Despite the fact that this approach ignores indirect DNA readout mechanisms, it was used to estimate the specificity of 82 transcription factors (49 from E. coli and 33 from B. subtilis), yielding no correlation between contact-based specificity and connectivity, presumably as a result of using approximate theoretical models, instead of crystallographic structures. However, global TFs display low specificities and therefore these somewhat low-resolution results give further support to our previous observations and are important as they show that similar conclusions might be reached using different data sources.

Adding co-regulation to binding specificity

So far these results suggest that highly connected TFs, those expected to have a larger impact on regulation, display relatively low binding specificities. However, by analyzing the curated data in RegulonDB 18 a more complex picture emerges, since a large fraction of E. coli promoters are subject to regulation by several TFs. Therefore, we should be studying binding specificity in the context of combinatorial regulation 32 (no such data is currently available for B. subtilis). Figure 4 shows a scatter plot of the number of co-regulators of TFs and the number of target genes in E. coli, revealing a correlation coefficient of 0.94 (pairs=153, R2=0.90, p=0). This clearly means that highly connected TFs, those that seem to be less able to discriminate DNA sequences, co-regulate more often than other TFs.

Figure 4
Scatter plot of co-regulators versus the number of regulated target genes in E. coli for each transcription factor

However, can this distribution of co-regulating TFs be explained in terms of random combinations? Well, we find that 839/2861 (29%) of E. coli genes are subject to regulation by only one transcription factor. Conversely, 71% of the total number of genes is found to be regulated by two or more TFs. We can take these proportions in order to calculate the expected number of co-regulated TGs for any one TF. Consider the transcription factor NarL, known to be affecting the expression of 98 target genes. We should expect that around 70 of those genes are co-regulated by other TFs. However, RegulonDB tells that 96 of those TGs are actually co-regulated. What does this difference mean? If this calculation is done with all TFs in E. coli we fill a table and can then calculate the statistical significance of the differences between the expected and the observed co-regulation frequencies by means of a χ2 test. Using this test we find a very small probability (p<10E-7) that the observed differences happen by chance (if we take all TFs with 5 or more expected co-regulated TGs the probability is still p<10E-7). Please note that most global regulatory proteins (with the exception of FIS) actually co-regulate more genes than could be expected by chance.

Since we have shown that highly connected TFs are less specific, these results can be interpreted as a sort of compensation mechanism: low specificity regulators have regulatory partners and even if can potentially bind to many DNA sequences, they will still need nearby co-regulating proteins in order to have an influence over transcription at several levels of the regulatory network. However, there are alternate ways of reading these results. Let us consider catabolite repression, which involves the preferential use of certain carbon sources over others when a mixture of them is available to the microorganism for growth, by means of co-regulation mechanisms33. In E. coli, the transcriptional regulation of catabolite repression is carried out by CRP, a global regulator showing a low specificity (sampled normUIC values of 0.39); however, 83% of its TGs are co-regulated by other TFs. This high rate of co-regulation may be understood by at least three mechanisms. Firstly, when complexed with its effector cAMP, CRP binds to binding sites in the promoter of some TGs, interacting directly with RNA polymerase to initiate transcription34. Secondly, suboptimal cAMP-CRP binding sites may also be targeted by CRP homologues responding to other signals, for example the redox-sensor FNR, and vice versa, thus permitting a degree of cross-talk between bsDNAs belonging to promoters controlled by proteins of the same family 35. Thirdly, the cAMP-CRP complex may also interact with promoter-specific TFs, such as the nucleoside-regulator CytR, increasing the DNA-binding specificity of its co-regulator i) by providing additional contacts through its surface, ii) by creating a DNA conformation that is better recognized by the co-regulator, or iii) by inducing a conformational change in the co-regulator that promotes its interaction with the bsDNA36; 37. To summarize, the complexity of co-regulation in prokaryotes prevents the formulation of a more general hypothesis that would explain the observed correlation with binding specificity, particularly when bacterial regulators usually include, apart from the DNA-binding domain, an effector-sensing domain that responds to particular ecological cues.

Low specificity transcription factors show high expression levels

Different sources of evidence presented here suggest that binding specificity is an important property of transcription factors that might help explain their biology. One arising prediction is that LS regulatory proteins are more likely to bind to genomic DNA sites, since their repertoire of recognized sequences is comparatively larger. However, the concentration of these proteins must also be considered, as this will ultimately limit the number of genomic sites bound 38. The set of microarray experiments collected by Faith 39 allows us to check this prediction in E. coli, as they provide data for 60 non-redundant conditions. Indeed these data seem to support this hypothesis, as shown in Figure 5, in which mean normalized expression values for E. coli transcription factors are plotted against their number of reported binding sites, with a significant correlation coefficient of 0.66 (pairs=65, R2=0.43, p<10E-8). This scatter plot shows that regulators such as CRP, with 207 binding sites reported in the genome, are expressed at higher levels than AraC, with only 13 sites reported. This coupling between mRNA expression levels and regulon size is a novel observation in bacteria, and was also predicted, although with little support from the data, in recent experiments in yeast 40. However, this can only be indirect evidence, since we can merely infer transcription levels, not protein concentrations. Additional data, such as the rate of occupation of operator sites in the genome, would be required to further test the hypothesis.

Figure 5
Mean expression value of E. coli transcription factors (across 60 non-redundant microarray experiments) plotted versus the number of reported binding sites within the genome

DNA-binding specificity of orthologous transcription factors in E. coli and B. subtilis

The use of two bacterial models with remarkably different life styles 20 and long phylogenetic distance 21; 22 gives us the opportunity to explore our findings by comparing orthologous TFs. As listed in Table 2, we found eight pairs of orthologous TFs with two or more experimentally verified DNA binding sites. Here we examine these orthologous pairs in order to test whether global and local TFs really exhibit different specificities that can be compared across species. If we skip Lrp, a global regulatory protein in E. coli for which only one binding site is available in B. subtilis (AzlB), it is found that in 5 out of 7 cases the specificity estimates are congruent, as lower values correspond to more binding sites. The values for DnaA are not congruent, but in both genomes it is clearly a very high specific transcription factor, with values greater than 1.1. However, CytR and CcpA have very similar specificity values in both species while the regulon sizes are 10 and 48, respectively. We now look at these examples with more detail.

Table 2
Orthologous TFs shared between Escherichia coli and Bacillus subtilis

The first cases are LexA and DnaA, two regulators that respond to DNA cleavage in both bacteria and bind DNA with high specificity, suggesting that indeed are local TFs with similar roles in different genomes. The second case is Fur, a local regulator in E. coli and B. subtilils that coordinates the expression of iron uptake and homeostasis pathways in response to available iron 41; 42; 43. Fur shows high specificity values in both organisms, as expected for such a specialized regulatory role.

The next cases are two orthologous TFs that are part of two-component regulatory systems. The first system, CpxR (CpxA) in E. coli, responds to several conditions associated with envelope stress, such as alkaline pH and overproduction of secreted proteins, and also to attachment of cells to surfaces or the assembly of structures on the cell surface, folding or degradation of misfolded proteins in the periplasm and pili subunits as well as monitoring of porin status 44. This system also responds to exposure to copper45 and EDTA46 in E. coli, while its B. subtilis counterpart YycF (YycG) is involved in the control of genes for cell wall metabolic processes, cell membrane composition and cell division47. The second, PhoB (PhoR), regulates the phosphate regulon in E. coli48, while its counterpart in B. subtilis, ResD (ResE), is involved in nitrate respiration in response to oxygen limitation or nitric oxide 49. Both orthologous TFs have high specificity values, as expected for local regulators, even when they can respond to different effectors.

The remaining orthologous TFs have different positional roles in both organisms. Let us first see CcpA, which is a global regulator in B. subtilis, controlling carbon catabolite repression (as CRP in E. coli) 50 with a specificity estimate of 0.88, while the orthologous CytR, a local regulator in E. coli 37, has a similar specificity value of 0.85. As mentioned earlier, these appear to be incongruent specificity estimates, as CcpA is known to bind to 48 sites, while CytR binds to 10. However, it should be mentioned that CytR, in co-regulation with CRP, has been described as the most promiscuous DNA-binder of the LacI familiy37.

Finally, we analyze the transcription factor FNR (fumarate and nitrate reduction), a global TF in E. coli (FNReco) which is local in B. subtilis (FNRbsu). FNReco has been extensively annotated in RegulonDB, while Reents and coworkers have been exhaustively studied the FNRbsu regulon via transcriptomic analysis in combination with bioinformatics-based binding site prediction 16. From 35 TGs identified as part of the FNR regulon during the transition of B. subtilis to anaerobic growth conditions, only eight genes are seen to be directly regulated via a cis-acting FNRbsu box in the corresponding promoter regions as demonstrated previously by Cruz-Ramos and coworkers via construction of fusions and mutant strains 51; 52. Indeed, the red dots in the Figures 2 show that FNR is relatively low specific in E. coli (sampled normUIC values of 0.63 for FNReco and 1.38 for FNRbsu), in agreement with the fact that FNR regulates a much larger set of genes in E. coli than in B. subtilis. The amino acid residues presumed to be recognizing specific FNR sites change from E. coli to B. subtilis, and as a consequence the sequence logos are partially different. However, we still ignore why this protein, that senses O2 via a Cysteine-[4Fe-4S]2+ cluster located in the amino terminus in FNReco 53 and the carboxyl terminus in FNRbsu 16, has a major regulatory role in E. coli and only a minor effect in the TRN of B. subtilis (see Table 2). We believe that the answer to this question lies on the ecological niches of both bacteria. E. coli has adapted to live inside the host’s gut and must be able to grow rapidly in the ileum under aerobic conditions but also in competition for limited nutrients under anaerobic conditions in the colon 54. Therefore, it seems that shifting between these two environments is part of the species lifestyle, and FNR regulates this by affecting the expression of 135 genes in E. coli 18. In contrast, B. subtilis usually dwells in the soil, where fluctuations in the availability of oxygen are not that frequent or periodic, depending mostly on the soil’s water content 20. Presumably this is why in this species FNR regulates the transcription of only 8 genes required for adaptation to low oxygen tension 16; 19.

To summarize, although orthologous proteins are generally thought to have the same function in different species, it has been previously reported that TFs are not conserved between phylogenetically distant species, specially the global regulators, that are gained or lost rapidly through evolution 10; 11; 55. Even in small phylogenetic distances, such as Proteobacteria for E. coli or Firmicutes for B. subtilis, it has been found that global regulators do not necessarily share similar evolutionary histories nor they regulate similar metabolic responses3; 12; 13; 14; 15; 16; 17. In this section we have presented a DNA-binding specificity assessment of the set of orthologous TFs present in E. coli and B. subtilis, suggesting that the correlations described throughout the paper can be of practical use for the task of characterizing the role of regulatory proteins in prokaryotes. Our data allows us to claim that it is possible to infer the function of a TF as global or local if we can confidently measure its binding specificity. However, the DNA-binding domain can only tell us about one half of the evolutionary and functional history of a bacterial TF. The sensing/allosteric domain is most likely the result of several evolutionary processes, perhaps dominated by the environmental relevance of the corresponding effector, as illustrated by the FNR analysis. In some cases, the evolutionary history of allosteric domains might be a much better guide in order to define the functional role of a TF, as perhaps the cases of CytR and CpxR suggest.

A conceptual model for the evolution of transcriptional regulatory networks

The presented results provide a picture of bacterial regulatory networks in which binding specificity is a predictor of the hierarchy of any TF. Our data suggest that the ability of TFs to conserve or gain new TGs is not inherited from their paralogous counterparts, but it is at least correlated to their power to discriminate DNA sequences. Here we approximated the specificity of transcription factors using three different approaches, observing that global regulators (including nucleoid-associated proteins56) from two bacterial models with remarkably different life styles and long phylogenetic distance consistently display low binding specificities, and that specificity values of most orthologous TFs between E. coli and B. subtilis are congruent with their global or local role. We have also found that low specificity regulators are transcribed at relative high levels in E. coli, perhaps as a consequence of these proteins not being co-localized with their TGs in the genome, suggesting that an efficient occupancy of binding sites may be achieved by high copy number instead 38; 40; 57. In addition, it is clear from Figure 4 that less specific TFs have more co-regulators, other TFs that help translate their global control to more specialized subsets of target genes, adding one more variable to this evolutionary scenario. However, it seems obvious that other variables will be conditioning the evolution of regulatory networks. Of special interest are variables that might be restricting or enhancing the ability of TFs to gain, conserve or even lose regulatory interactions.

For instance, the mechanisms that generate or delete genomic binding sites should also be considered to fully understand this question, as already envisaged by Sengupta and collaborators29. In this respect, Figure 6 shows a scatter plot of the theoretically estimated probability of site generation and the number of cognate binding sites of transcription factors in both E. coli and B. subtilis, predicting that LS regulators are more likely to bind to DNA sites appearing as a result of point mutations. A protein such as CRP, able to recognize 90 different oligonucleotides, will bind a randomly generated sequence with a probability roughly two orders of magnitude larger than CaiF, able to discriminate only 2 sequences. A different view to the same numbers could be that poor DNA sequence discriminators, with large sets of targets genes, are less vulnerable to random genomic mutations, since more mutations are needed to disable a binding site. Moreover, it should be noted that bacterial genomes are plastic and experience genomic rearrangements that modify the composition and orientation of operons, providing means for creating or destroying binding sites beyond point mutations27; 58. Our specificity estimations might be indicating that local regulators, in evolutionary time scales, are more likely to gain binding sites as a result of such genomic rearrangement events. However, this hypothesis would require further testing and we have no direct evidence supporting it.

Figure 6Figure 6
Theoretical estimates of the probability of random generation of genomic binding sites in E. coli (A) and B. subtilis (B)

In addition, as bacterial regulators usually include a signal-sensing allosteric domain, it is likely that the metabolic and ecological relevance of these effectors will largely affect the evolution of TFs and their regulons. In other words, as introduced in the previous section, the evolutionary fate of transcription factors will depend on both the DNA-binding and the allosteric domains. We anticipate two ways in which sensing domains might have and impact over the network evolution. Firsty, they might induce conformational changes on the attached DNA-binding domains upon binding of effector molecules. For instance, it has been demonstrated that CRP increases its specificity after binding to cyclic AMP molecules34. Similar evidence has been found for LacI59 or Cbl60. In this sense, it seems that allosteric domains might be regulating specificity, somewhat compensating the intrinsic promiscuity of some DNA-binding domains. Secondly, not all signals sensed by regulatory proteins are equally relevant for the species adaptation, nor they evenly describe the species’s ecological niche. This conceptual model predicts that TFs are more likely to conserve or gain new target genes if they increase adaptation by logically linking allosteric effectors to the expression of new regulatory targets or operons‥ In summary, the model in Figure 7 attempts to summarize the evolutionary variables that make regulons grow and shrink between species, such as FNR in E. coli and B. subtilis, as bacteria tune their regulatory networks in order to better respond to their environment and their metabolic requirements.

Figure 7Figure 7
Evolutionary model for regulatory networks


Regulatory network collection

We downloaded the transcriptional regulatory interactions of E. coli K12 from RegulonDB release 5.5 18. We also obtained the regulatory interactions of B. subtilis from the Database of transcriptional regulation in B. subtilis (DBTBS) release 4.1 19. Both databases compile experimental information curated from the literature. We considered only regulatory interactions where the DNA binding sites have been experimentally characterized. For E. coli we collected a total of 85 transcription factors regulating 1593 target genes through 1314 DNA binding sites, while we collected a total of 91 TFs regulating 732 TGs through 944 bsDNA in B. subtilis (see Table S1 from Supplementary Material).

Detection of paralogy and orthology of transcription factors

Search of paralogues

In order to detect possible TF duplication events in the genomes of E. coli and B. subtilis, we used both sequence and three-dimensional structural domain assignments of the proteins in the network as a measure of paralogy. Therefore, if two proteins had exactly the same domain composition and the same number of domains, we assumed that they were derived from genetic duplication of a common ancestor. As bacterial regulators usually have at least two protein domains, conservation of the DNA-binding domain was not considered sufficient to detect paralogy. We defined domains according to the structural annotation system of the SUPERFAMILY database 61, based on the domain classification scheme of SCOP 62, and according to the sequence annotations of the PFAM database 63. Both assignment schemes rely on the use of libraries of hidden Markov models (HMM) to represent domains.

We searched for protein domains in the complete genomes of E. coli and B. subtilis using HMMs taken from PFAM version 20.0 and SUPERFAMILY version 1.69, using the HMMER 2.3.1 program 64 with an expectation value ≤ 10−3. This cut-off value has been used previously to define TFs families in bacteria 3; 65; 66, although it is less stringent than the E-value ≤10−4 used to reduce the total number of superfamilies assigned to major clades (Archaea, Bacteria, and Eukarya) by Yang and co-workers 21. E-values here also serve as a confidence level for every candidate identified as a paralogue within an organism.

Thus, we predict groups of paralogues that include the set of 85 know TFs and 1593 TGs of E. coli from RegulonDB release 5.5 and the set of 91 know TFs and 732 TGs of B. subtilis from BDTBS release 4.1. In order to group putative paralogous regulatory proteins, we required that each group included the same resulting members after both PFAM and SUPERFAMILY domain assignments, except in the cases of seven E. coli and one B. subtilis TFs that have no SUPERFAMILY assignments with our cut-off value. In those cases only PFAM assignments were considered in order to group them.

Search of orthologues

The search for orthologues was carried out as reported previously 10, assigning functional roles to TFs in other genomes by first filtering intraspecific paralogues and then using an intersection of three criteria for the detection of orthology: (i) bi-directional best hits (BDBHs), (ii) coverage of BLASTP 67 pairwise alignments and (iii) conservation of PFAM domains. Accordingly, we identified orthologues as pairs of B. subtilis and E. coli proteins that satisfy the following conditions:

  1. Sequences of the target genome that have a BDBH in the query genome with a significant BLASTP E-value (<10−3).
  2. At least 70% of the query sequence is included in the BLASTP alignment.
  3. Target sequences share the PFAM domains of their query counterparts. Target sequences having one or more domains which match the orientation and arrangement to that of the query sequence and do not increment the total size of the protein in more than 100 residues were also considered in the analysis.

Estimation of transcription factor specificity based on the information content of DNA sequence motifs

Here we describe a way to estimate the observed DNA binding specificity of transcription factors for which we have at least two experimentally characterized binding sites. The process is essentially the same for our two bacterial datasets, with minor differences justified by the different annotation detail of E. coli and B. subtilis sites.

For E. coli we had a collection of 67 TFs with at least 2 reported sites, with 25 having more than 10 annotated sites. We used the computer program CONSENSUS 68 to build optimized sequence motifs with equiprobable prior nucleotide frequencies. We used the motif widths defined in RegulonDB 5.5 for each TF. CONSENSUS returns the unadjusted information content for each motif (UIC), that can be width-normalized so that different motifs can be directly compared, using the expression ICnorm = IC / width. This is necessary as the motifs used in this work have widths that range from 7 (for instance NarL) to more than 20, and this variable ultimately limits the information content of motifs.

For B. subtilis we had a collection of 70 TFs with a minimum of 2 known sites, of which 23 have more than 10 associated sites, all extracted from DBTBS 4.1. Since sites for the same TF can have different widths in this data source, we used the program WCONSENSUS 68 to build sequence motifs with a prior %GC content of 43. This program attempts to find the optimal motif width in terms of information content.

In order to estimate the variability of scores for sites recognized by every TF we took the position weight matrices (PWM) generated by CONSENSUS (E. coli) and WCONSENSUS (B. subtilis) and aligned all available sites for each TF against them, by running the program PATSER 68 and recording the scores. The highest and lowest scores were kept, as well as the standard deviation, and the variability calculated with Equation 1:

(Equation 1)

Note that these variability measurements are normalized by the standard deviation of scores for a given TF, so they are comparable for different TFs.

Estimation of transcription factor specificity by estimating DNA binding potential

A modified version of the DNASITE program 69, that uses full atom detail and identifies hydrogen bonds and hydrophobic interactions, was used to estimate DNA-binding potentials (manuscript under review). Briefly, the program threads experimentally characterized DNA binding sites from RegulonDB 5.5 into crystallographic protein-DNA complexes for 11 transcription factors in E. coli and scores each site using H-bond and Van der Waals weight matrices. These matrices give log-likelihood scores to pairs of interacting atoms in the protein-DNA interface and were compiled on a set of non-redundant protein-DNA complexes. The sum of weights over a protein-DNA interface, linearly combined with indirect readout DNA deformation, is regarded as the potential of binding of a given site. As before, we calculate score variability for a TF using Equation 1. These are the eleven TFs used here, with the number of binding sites for each indicated in parenthesis: TrpR (10), Rob (6), PurR (15), PhoB (16), NarL (73), MetJ (23), MarA (13), FadR (10), DnaA (8), CRP (182) and LacR (3). The list of corresponding Protein Data Bank complexes is: 1TRO 70, 1D5Y 71, 2PUA 72, 1GXP 73, 1JE8 74, 1CMA 75, 1XS9 76, 1H9T 77, 1J1V 78, 1CGP 79 and 1EFA 80.

Estimation of mean expression values from microarray experiments

A set of 60 published non-redundant expression profiles for E. coli was provided by the authors 39, already normalized using the robust multi-array analysis (RMA) procedure, that allows direct comparisons between them. Most of these conditions are independent single-gene over-expression experiments. The mean expression value across 60 conditions was then calculated for all those E. coli transcription factors for which an information content estimate of specificity was available, to produce the scatter plot shown in Figure 7.

Calculation of correlation coefficients

All correlation coefficients mentioned in this paper correspond to Pearson coefficients calculated using the function cor.test in the R package for statistical computing (http://www.rproject.org/).

Calculation of probabilities of site generation

The collection of binding sites for every TF was aligned using CONSENSUS with a fixed motif width of 10 columns, to make them all directly comparable. Alignments are then parsed in order to count the number of different sites of length 10 found, a number called diffN, that is an approximation of the sequence space recognized by any TF. The probability of generating sites for any one TF is then calculated by dividing diffN by 410, the total number of possible oligonucleotides of that length.

Supplementary Material



We thank Heladia Salgado and Sarath Chandra Janga for their help in obtaining RegulonDB and microarray expression data. We are also grateful to the Computational Genomics Group and an anonymous referee for comments and suggestions to improve this work. The Computational Genomics group is supported by NIH grant RO1-GM071962. B.C.M. was funded by a postdoctoral fellowship from Universidad Nacional Autónoma de México and by Fundación Aragón I+D. V.E.A. was supported by Red Iberoamericana de Bioinformática and CYTED and is now recipient of a doctoral fellowship awarded by Banco Santander Central Hispano, Fundación Carolina and Universidad de Zaragoza.

Abbreviations footnote

Transcription Factor
Target Gene
Transcriptional Regulatory Network
DNA binding site
information content
position-weight matrix
bi-directional best hit
high specificity
low specificity


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


1. Thieffry D, Huerta AM, Perez-Rueda E, Collado-Vides J. From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli. Bioessays. 1998;20:433–440. [PubMed]
2. Guelzim N, Bottani S, Bourgine P, Kepes F. Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet. 2002;31:60–63. [PubMed]
3. Moreno-Campuzano S, Janga SC, Perez-Rueda E. Identification and analysis of DNA-binding transcription factors in Bacillus subtilis and other Firmicutes--a genomic approach. BMC Genomics. 2006;7:147. [PMC free article] [PubMed]
4. Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. [PubMed]
5. Gottesman S. Bacterial regulation: global regulatory networks. Annu Rev Genet. 1984;18:415–441. [PubMed]
6. Martinez-Antonio A, Collado-Vides J. Identifying global regulators in transcriptional regulatory networks in bacteria. Curr Opin Microbiol. 2003;6:482–489. [PubMed]
7. Foster DV, Kauffman SA, Socolar JE. Network growth models and genetic regulatory networks. Phys Rev E Stat Nonlin Soft Matter Phys. 2006;73:031912. [PubMed]
8. Teichmann SA, Babu MM. Gene regulatory network growth by duplication. Nat Genet. 2004;36:492–496. [PubMed]
9. Cosentino Lagomarsino M, Jona P, Bassetti B, Isambert H. Hierarchy and feedback in the evolution of the Escherichia coli transcription network. Proc Natl Acad Sci U S A. 2007;104:5516–5520. [PMC free article] [PubMed]
10. Lozada-Chavez I, Janga SC, Collado-Vides J. Bacterial regulatory networks are extremely flexible in evolution. Nucleic Acids Res. 2006;34:3434–3445. [PMC free article] [PubMed]
11. Madan Babu M, Teichmann SA, Aravind L. Evolutionary dynamics of prokaryotic transcriptional regulatory networks. J Mol Biol. 2006;358:614–633. [PubMed]
12. Tobisch S, Zuhlke D, Bernhardt J, Stulke J, Hecker M. Role of CcpA in regulation of the central pathways of carbon catabolism in Bacillus subtilis. J Bacteriol. 1999;181:6996–7004. [PMC free article] [PubMed]
13. Morales G, Linares JF, Beloso A, Albar JP, Martinez JL, Rojo F. The Pseudomonas putida Crc global regulator controls the expression of genes from several chromosomal catabolic pathways for aromatic compounds. J Bacteriol. 2004;186:1337–1344. [PMC free article] [PubMed]
14. Friedberg D, Midkiff M, Calvo JM. Global versus local regulatory roles for Lrp-related proteins: Haemophilus influenzae as a case study. J Bacteriol. 2001;183:4004–4011. [PMC free article] [PubMed]
15. Suh SJ, Runyen-Janecky LJ, Maleniak TC, Hager P, MacGregor CH, Zielinski-Mozny NA, Phibbs PV, Jr, West SE. Effect of vfr mutation on global gene expression and catabolite repression control of Pseudomonas aeruginosa. Microbiology. 2002;148:1561–1569. [PubMed]
16. Reents H, Munch R, Dammeyer T, Jahn D, Hartig E. The Fnr regulon of Bacillus subtilis. J Bacteriol. 2006;188:1103–1112. [PMC free article] [PubMed]
17. Derouaux A, Dehareng D, Lecocq E, Halici S, Nothaft H, Giannotta F, Moutzourelis G, Dusart J, Devreese B, Titgemeyer F, Van Beeumen J, Rigali S. Crp of Streptomyces coelicolor is the third transcription factor of the large CRP-FNR superfamily able to bind cAMP. Biochem Biophys Res Commun. 2004;325:983–990. [PubMed]
18. Salgado H, Gama-Castro S, Peralta-Gil M, Diaz-Peredo E, Sanchez-Solano F, Santos-Zavaleta A, Martinez-Flores I, Jimenez-Jacinto V, Bonavides-Martinez C, Segura-Salazar J, Martinez-Antonio A, Collado-Vides J. RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. 2006;34:D394–D397. [PMC free article] [PubMed]
19. Makita Y, Nakao M, Ogasawara N, Nakai K. DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics. Nucleic Acids Res. 2004;32:D75–D77. [PMC free article] [PubMed]
20. Nakano MM, Zuber P. Anaerobic growth of a "strict aerobe" (Bacillus subtilis) Annu Rev Microbiol. 1998;52:165–190. [PubMed]
21. Yang S, Doolittle RF, Bourne PE. Phylogeny determined by protein domain content. Proc Natl Acad Sci U S A. 2005;102:373–378. [PMC free article] [PubMed]
22. Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P. Toward automatic reconstruction of a highly resolved tree of life. Science. 2006;311:1283–1287. [PubMed]
23. Amoutzias GD, Weiner J, Bornberg-Bauer E. Phylogenetic profiling of protein interaction networks in eukaryotic transcription factors reveals focal proteins being ancestral to hubs. Gene. 2005;347:247–253. [PubMed]
24. Madan Babu M, Teichmann SA. Evolution of transcription factors and the gene regulatory network in Escherichia coli. Nucleic Acids Res. 2003;31:1234–1244. [PMC free article] [PubMed]
25. Moses AM, Pollard DA, Nix DA, Iyer VN, Li XY, Biggin MD, Eisen MB. Large-scale turnover of functional transcription factor binding sites in Drosophila. PLoS Comput Biol. 2006;2:e130. [PMC free article] [PubMed]
26. Doniger SW, Fay JC. Frequent gain and loss of functional transcription factor binding sites. PLoS Comput Biol. 2007;3:e99. [PMC free article] [PubMed]
27. Espinosa V, Gonzalez AD, Vasconcelos AT, Huerta AM, Collado-Vides J. Comparative studies of transcriptional regulation mechanisms in a group of eight gamma-proteobacterial genomes. J Mol Biol. 2005;354:184–199. [PubMed]
28. Rajewsky N, Socci ND, Zapotocky M, Siggia ED. The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons. Genome Res. 2002;12:298–308. [PMC free article] [PubMed]
29. Sengupta AM, Djordjevic M, Shraiman BI. Specificity and robustness in transcription control networks. Proc Natl Acad Sci U S A. 2002;99:2072–2077. [PMC free article] [PubMed]
30. Stormo GD, Fields DS. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem Sci. 1998;23:109–113. [PubMed]
31. Luscombe NM, Thornton JM. Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. J Mol Biol. 2002;320:991–1009. [PubMed]
32. Bilu Y, Barkai N. The design of transcription-factor binding sites is affected by combinatorial regulation. Genome Biol. 2005;6:R103. [PMC free article] [PubMed]
33. Cases I, de Lorenzo V. Expression systems and physiological control of promoter activity in bacteria. Curr Opin Microbiol. 1998;1:303–310. [PubMed]
34. Garges S, Adhya S. Cyclic AMP-induced conformational change of cyclic AMP receptor protein (CRP): intragenic suppressors of cyclic AMP-independent CRP mutations. J Bacteriol. 1988;170:1417–1422. [PMC free article] [PubMed]
35. Sawers G, Kaiser M, Sirko A, Freundlich M. Transcriptional activation by FNR and CRP: reciprocity of binding-site recognition. Mol Microbiol. 1997;23:835–845. [PubMed]
36. Kallipolitis BH, Norregaard-Madsen M, Valentin-Hansen P. Protein-protein communication: structural model of the repression complex formed by CytR and the global regulator CRP. Cell. 1997;89:1101–1109. [PubMed]
37. Pedersen H, Valentin-Hansen P. Protein-induced fit: the CRP activator protein changes sequence-specific DNA recognition by the CytR repressor, a highly flexible LacI member. Embo J. 1997;16:2108–2118. [PMC free article] [PubMed]
38. Evangelisti AM, Wagner A. Molecular evolution in the yeast transcriptional regulation network. J Exp Zoolog B Mol Dev Evol. 2004;302:392–411. [PubMed]
39. Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 2007;5:e8. [PMC free article] [PubMed]
40. Aurell E, d'Herouel AF, Malmnas C, Vergassola M. Transcription factor concentrations versus binding site affinities in the yeast S. cerevisiae. Phys Biol. 2007;4:134–143. [PubMed]
41. Bagg A, Neilands JB. Ferric uptake regulation protein acts as a repressor, employing iron (II) as a cofactor to bind the operator of an iron transport operon in Escherichia coli. Biochemistry. 1987;26:5471–5477. [PubMed]
42. Baichoo N, Wang T, Ye R, Helmann JD. Global analysis of the Bacillus subtilis Fur regulon and the iron starvation stimulon. Mol Microbiol. 2002;45:1613–1629. [PubMed]
43. Ollinger J, Song KB, Antelmann H, Hecker M, Helmann JD. Role of the Fur regulon in iron transport in Bacillus subtilis. J Bacteriol. 2006;188:3664–3673. [PMC free article] [PubMed]
44. Batchelor E, Walthers D, Kenney LJ, Goulian M. The Escherichia coli CpxA-CpxR envelope stress response system regulates expression of the porins ompF and ompC. J Bacteriol. 2005;187:5723–5731. [PMC free article] [PubMed]
45. Yamamoto K, Ishihama A. Characterization of copper-inducible promoters regulated by CpxA/CpxR in Escherichia coli. Biosci Biotechnol Biochem. 2006;70:1688–1695. [PubMed]
46. DiGiuseppe PA, Silhavy TJ. Signal detection and target gene induction by the CpxRA two-component system. J Bacteriol. 2003;185:2432–2440. [PMC free article] [PubMed]
47. Howell A, Dubrac S, Noone D, Varughese KI, Devine K. Interactions between the YycFG and PhoPR two-component systems in Bacillus subtilis: the PhoR kinase phosphorylates the non-cognate YycF response regulator upon phosphate limitation. Mol Microbiol. 2006;59:1199–1215. [PubMed]
48. Makino K, Shinagawa H, Amemura M, Kawamoto T, Yamada M, Nakata A. Signal transduction in the phosphate regulon of Escherichia coli involves phosphotransfer between PhoR and PhoB proteins. J Mol Biol. 1989;210:551–559. [PubMed]
49. Baruah A, Lindsey B, Zhu Y, Nakano MM. Mutational analysis of the signal-sensing domain of ResE histidine kinase from Bacillus subtilis. J Bacteriol. 2004;186:1694–1704. [PMC free article] [PubMed]
50. Lulko AT, Buist G, Kok J, Kuipers OP. Transcriptome analysis of temporal regulation of carbon metabolism by CcpA in Bacillus subtilis reveals additional target genes. J Mol Microbiol Biotechnol. 2007;12:82–95. [PubMed]
51. Cruz Ramos H, Hoffmann T, Marino M, Nedjari H, Presecan-Siedel E, Dreesen O, Glaser P, Jahn D. Fermentative metabolism of Bacillus subtilis: physiology and regulation of gene expression. J Bacteriol. 2000;182:3072–3080. [PMC free article] [PubMed]
52. Cruz Ramos H, Boursier L, Moszer I, Kunst F, Danchin A, Glaser P. Anaerobic transcription activation in Bacillus subtilis: identification of distinct FNR-dependent and -independent regulatory mechanisms. Embo J. 1995;14:5984–5994. [PMC free article] [PubMed]
53. Khoroshilova N, Popescu C, Munck E, Beinert H, Kiley PJ. Iron-sulfur cluster disassembly in the FNR protein of Escherichia coli by O2: [4Fe-4S] to [2Fe-2S] conversion with loss of biological activity. Proc Natl Acad Sci U S A. 1997;94:6087–6092. [PMC free article] [PubMed]
54. Schaechter M. Escherichia coli, General Biology. In: Lederberg J, editor. Encyclopedia of Microbiology. Second Edition. Vol. 1 A–C. New York: Academic Press; 2000. pp. 260–269.
55. Price MN, Dehal PS, Arkin AP. Orthologous transcription factors in bacteria have different functions and regulate different genes. PLoS Comput Biol. 2007;3:1739–1750. [PMC free article] [PubMed]
56. Dame RT. The role of nucleoid-associated proteins in the organization and compaction of bacterial chromatin. Mol Microbiol. 2005;56:858–870. [PubMed]
57. Kolesov G, Wunderlich Z, Laikova ON, Gelfand MS, Mirny LA. How gene order is influenced by the biophysics of transcription regulation. Proc Natl Acad Sci U S A. 2007;104:13948–13953. [PMC free article] [PubMed]
58. Watanabe H, Mori H, Itoh T, Gojobori T. Genome plasticity as a paradigm of eubacteria evolution. J Mol Evol. 1997;44 Suppl 1:S57–S64. [PubMed]
59. Daber R, Stayrook S, Rosenberg A, Lewis M. Structural analysis of lac repressor bound to allosteric effectors. J Mol Biol. 2007;370:609–619. [PMC free article] [PubMed]
60. Stec E, Witkowska-Zimny M, Hryniewicz MM, Neumann P, Wilkinson AJ, Brzozowski AM, Verma CS, Zaim J, Wysocki S, Bujacz GD. Structural basis of the sulphate starvation response in E. coli: crystal structure and mutational analysis of the cofactor-binding domain of the Cbl transcriptional regulator. J Mol Biol. 2006;364:309–322. [PubMed]
61. Gough J, Chothia C. SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 2002;30:268–272. [PMC free article] [PubMed]
62. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–540. [PubMed]
63. Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL. The Pfam protein families database. Nucleic Acids Res. 2002;30:276–280. [PMC free article] [PubMed]
64. Eddy SR. Hidden Markov models. Curr Opin Struct Biol. 1996;6:361–365. [PubMed]
65. Perez-Rueda E, Collado-Vides J. The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucleic Acids Res. 2000;28:1838–1847. [PMC free article] [PubMed]
66. Perez-Rueda E, Collado-Vides J, Segovia L. Phylogenetic distribution of DNA-binding transcription factors in bacteria and archaea. Comput Biol Chem. 2004;28:341–350. [PubMed]
67. Lopez R, Silventoinen V, Robinson S, Kibria A, Gish W. WU-Blast2 server at the European Bioinformatics Institute. Nucleic Acids Res. 2003;31:3795–3798. [PMC free article] [PubMed]
68. Hertz GZ, Stormo GD. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics. 1999;15:563–577. [PubMed]
69. Contreras-Moreira B, Collado-Vides J. Comparative footprinting of DNA-binding proteins. Bioinformatics. 2006;22:e74–e80. [PubMed]
70. Otwinowski Z, Schevitz RW, Zhang RG, Lawson CL, Joachimiak A, Marmorstein RQ, Luisi BF, Sigler PB. Crystal structure of trp repressor/operator complex at atomic resolution. Nature. 1988;335:321–329. [PubMed]
71. Kwon HJ, Bennik MH, Demple B, Ellenberger T. Crystal structure of the Escherichia coli Rob transcription factor in complex with DNA. Nat Struct Biol. 2000;7:424–430. [PubMed]
72. Schumacher MA, Choi KY, Zalkin H, Brennan RG. Crystal structure of LacI member, PurR, bound to DNA: minor groove binding by alpha helices. Science. 1994;266:763–770. [PubMed]
73. Blanco AG, Sola M, Gomis-Ruth FX, Coll M. Tandem DNA recognition by PhoB, a two-component signal transduction transcriptional activator. Structure. 2002;10:701–713. [PubMed]
74. Maris AE, Sawaya MR, Kaczor-Grzeskowiak M, Jarvis MR, Bearson SM, Kopka ML, Schroder I, Gunsalus RP, Dickerson RE. Dimerization allows DNA target site recognition by the NarL response regulator. Nat Struct Biol. 2002;9:771–778. [PubMed]
75. Somers WS, Phillips SE. Crystal structure of the met repressor-operator complex at 2.8 A resolution reveals DNA recognition by beta-strands. Nature. 1992;359:387–393. [PubMed]
76. Dangi B, Gronenborn AM, Rosner JL, Martin RG. Versatility of the carboxy-terminal domain of the alpha subunit of RNA polymerase in transcriptional activation: use of the DNA contact site as a protein contact site for MarA. Mol Microbiol. 2004;54:45–59. [PubMed]
77. van Aalten DM, DiRusso CC, Knudsen J. The structural basis of acyl coenzyme A-dependent regulation of the transcription factor FadR. Embo J. 2001;20:2041–2050. [PMC free article] [PubMed]
78. Fujikawa N, Kurumizaka H, Nureki O, Terada T, Shirouzu M, Katayama T, Yokoyama S. Structural basis of replication origin recognition by the DnaA protein. Nucleic Acids Res. 2003;31:2077–2086. [PMC free article] [PubMed]
79. Schultz SC, Shields GC, Steitz TA. Crystal structure of a CAP-DNA complex: the DNA is bent by 90 degrees. Science. 1991;253:1001–1007. [PubMed]
80. Bell CE, Lewis M. A closer view of the conformation of the Lac repressor bound to operator. Nat Struct Biol. 2000;7:209–214. [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...