![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2009 Harari et al; licensee BioMed Central Ltd. Identifying promoter features of co-regulated genes with similar network motifs 1Department of Computer Science and Artificial Intelligence, University of Granada, c/. Daniel Saucedo Aranda, s/n 18071, Granada, Spain 2Department of Molecular Cell Biology, Samsung Biomedical Research Institute, Sungkyunkwan University School of Medicine, Suwon 440-746, South Korea 3Department of Molecular Microbiology, Washington University School of Medicine, Campus Box 8230, 660 S. Euclid Ave., St. Louis, Missouri, 63110, USA 4Department of Molecular Microbiology, Washington University School of Medicine, Howard Hughes Medical Institute, Campus Box 8230, 660 South Euclid Avenue, St. Louis, Missouri, 63110-1093, USA Corresponding author.Oscar Harari: oharari/at/decsai.ugr.es; Coral del Val: delval/at/decsai.ugr.es; Rocío Romero-Zaliz: rocio/at/decsai.ugr.es; Dongwoo Shin: dshin/at/med.skku.ac.kr; Henry Huang: huang/at/borcim.wustl.edu; Eduardo A Groisman: groisman/at/borcim.wustl.edu; Igor Zwir: zwir/at/borcim.wustl.edu SupplementProceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2008 Sun Kim http://www.biomedcentral.com/content/pdf/1471-2105-10-S4-info.pdfConferenceIEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2008 3–5 November 2008 Philadelphia, PA, USA This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background A large amount of computational and experimental work has been devoted to uncovering network motifs in gene regulatory networks. The leading hypothesis is that evolutionary processes independently selected recurrent architectural relationships among regulators and target genes (motifs) to produce characteristic expression patterns of its members. However, even with the same architecture, the genes may still be differentially expressed. Therefore, to define fully the expression of a group of genes, the strength of the connections in a network motif must be specified, and the cis-promoter features that participate in the regulation must be determined. Results We have developed a model-based approach to analyze proteobacterial genomes for promoter features that is specifically designed to account for the variability in sequence, location and topology intrinsic to differential gene expression. We provide methods for annotating regulatory regions by detecting their subjacent cis-features. This includes identifying binding sites for a transcriptional regulator, distinguishing between activation and repression sites, direct and reverse orientation, and among sequences that weakly reflect a particular pattern; binding sites for the RNA polymerase, characterizing different classes, and locations relative to the transcription factor binding sites; the presence of riboswitches in the 5'UTR, and for other transcription factors. We applied our approach to characterize network motifs controlled by the PhoP/PhoQ regulatory system of Escherichia coli and Salmonella enterica serovar Typhimurium. We identified key features that enable the PhoP protein to control its target genes, and distinct features may produce different expression patterns even within the same network motif. Conclusion Global transcriptional regulators control multiple promoters by a variety of network motifs. This is clearly the case for the regulatory protein PhoP. In this work, we studied this regulatory protein and demonstrated that understanding gene expression does not only require identifying a set of connexions or network motif, but also the cis-acting elements participating in each of these connexions. Background Transcription regulatory networks can be represented as directed graphs in which a node stands for a gene (or an operon in the case of bacteria) and an edge symbolizes a direct transcriptional interaction. Recurrent patterns of interactions, termed network motifs, occur far more often than in randomized networks, forming elementary building blocks that carry out key functions. This is a convenient representation of the architecture of a set of regulatory Boolean (i.e. ON-OFF) networks, in which each gene is either fully expressed or not expressed at all, or that it has a binding site for a transcriptional regulator or lacks such a site. However, this approach has serious limitations because most genes are not expressed in a simple Boolean fashion. Indeed, genes that are co-regulated by the same transcription factor are often differently expressed with characteristic expression levels and kinetics. Therefore, a deeper understanding of regulatory networks demands the identification of the key features used by a transcriptional regulator to differentially control genes that display distinct behaviours despite belonging to networks with identical motifs. The identification of the promoter features that determine the distinct expression behavior of co-regulated genes is a challenging task because: first, these features are often short combinations of a constrained four-symbol DNA alphabet. Therefore, it is not clear how to distinguish a sequence pattern that could affect gene expression from a just slightly different random sequence [1,2]. Second, the sequences recognized by a transcription factor may differ from promoter to promoter within and between genomes and may be located at various distances from other cis-acting features in different promoters [3,4]. Third, similar expression patterns can be generated from different or a mixture of multiple underlying features, thus, making it more difficult to discern the causes of analogous regulatory effects. In this study, we present a method specifically aimed at handling the variability in sequence, location and topology that characterize gene transcription. We decompose a feature into a family of models or building blocks that uncover important differences among observations that are often concealed when using global patterns that tend to average sequences between promoters and even across species. This approach maximizes the sensitivity of detecting those instances that weakly resemble a consensus (e.g., binding site sequences) without decreasing the specificity. In addition, features are considered using fuzzy assignments, which allow us to encode how well a particular sequence matches each of the multiple models for a given promoter feature. Individual features can be linked into more informative composite models that can be used to explain the kinetic expression behavior of genes. We applied our method to analyze promoters controlled by the PhoP/PhoQ regulatory system of Escherichia coli and Salmonella enterica serovar Typhimurium. This system responds to the same inducing signal (i.e. low Mg2+) in both species [4-7]. Moreover, the E. coli phoP gene could complement a Salmonella phoP mutant [8]. The DNA-binding PhoP protein appears to recognize a tandem repeat sequence separated by 5 bp [4-6], consistent with being a dimer [9]. The PhoP/PhoQ system is an excellent test case because it controls the expression of a large number of genes, amounting to ca. 3% of the genes in the case of Salmonella [10]. Furthermore, the PhoP/PhoQ regulon has been shown to employ a variety of network motifs including the single-input module (Fig. (Fig.1A),1A
Results and discussion Approach We investigated five types of cis-acting promoter features by extracting the maximal amount of useful information from datasets and then creating models that describe promoter regulatory regions. This entailed applying three key strategies: first, we conducted an initial survey of the data provided from different available sources, capturing and distinguishing between broad and easily discernable patterns. We then used these patterns as models to re-visit the data with greater sensitivity and specificity. This allowed us not only to recognize those instances with a low resemblance to consensus models, but also to reflect and annotate the diversity of the observations (i.e., when distances between the transcription factor binding site and RNA polymerase are unusual). Second, we utilized fuzzy clustering methods [13,14] to encode promoter matching to multiple models for a given promoter feature, which avoided having to make premature categorical assignments, and producing an initial classification of the promoters into multiple subsets. Finally, we applied fuzzy logic [15] to relate some basic features into more informative composite models that may explain the distinct expression behavior of genes belonging to similar networks (Fig. (Fig.2).2
Activated/repressed promoters Gene expression data normally allow clear separation of genes into those that are activated and those that are repressed by a regulatory protein. Because the expression signal is sometimes absent or too low to be informative, we considered the location of a transcription factor binding site relative to that of the RNA polymerase to separate promoters into activated and repressed subsets (Fig. 3A, B
We determined that the location of binding sites functioning in activation is different from that corresponding to sites functioning in repression (Fig. 3A, B Transcription factor binding site orientation Functional binding sites for a transcription factor may be present in either orientation relative to the RNA polymerase binding site [21]. This is due to the possibility of DNA looping and to the flexibility of the alpha subunit of the bacterial RNA polymerase in its interactions with transcriptional regulators [22,23]. Yet, promoters harboring binding boxes in different orientation can be controlled by PhoP using the same network motif. That is the case of the yobG, and slyB (direct), compared to pagK and pagC (opposite) Salmonella promoters (Fig. (Fig.4A).4A
Transcription factor binding site patterns Many genes are controlled by a single-input network motif where the affinity of a transcription factor for its promoter sequences is a major determinant of gene expression. Thus, co-regulated genes displaying distinct expression patterns are likely to differ in the binding site for such a transcription factor (Fig. (Fig.4B).4B We decomposed set of binding site sequences corresponding to a transcription factor into several patterns and then combined them to increased the sensitivity to weak sites without losing specificity (a detailed sensitivity performance analysis and evolutionary effects of these patters are described in O.H. et al, manuscript in preparation). In the case of PhoP, we used this approach to search both strands of the intergenic regions of the E. coli and Salmonella genomes (Fig. (Fig.2).2 Riboswitch site patterns Riboswitches are structured domains that usually reside in the non-coding regions of mRNAs (UTRs), where they bind specific metabolites and control gene expression. The most common effects occur at the level of premature termination of transcription (cis-acting) or translation initiation. Upstream regions of PhoP regulated genes were screened for riboswitches by analyzing the presence of segments with conserved secondary structure across genomes and thermodynamic stability; because Rfam http://www.sanger.ac.uk/Software/Rfam searches did not produce significant hits. Then, we evaluate if these candidate segments could be either small non-coding RNA or riboswitches, depending on their relative location to the beginning of the gene. Those candidates with conserved helixes, stable thermodynamically energy, and located close (<5 bp) to the translation start site of the closest gene, were further inspected as possible riboswitches. We found several genes with a long UTR region as possible candidates (see http://gps-tools2.wustl.edu/data/riboswitch.xls). One of these genes is the Salmonella mgtA promoter, which has been experimentally validated (Fig. (Fig.4C)4C RNA polymerase binding site patterns and location The distance of a transcription factor binding site to the RNA polymerase binding site(s) and the class of sigma 70 promoter are critical determinants of gene expression [22]. These classes correspond to the different types of contacts that can be established between a transcription factor and RNA polymerase. We identified six patterns among PhoP-regulated promoters of E. coli and Salmonella (Fig. (Fig.2)2 Some PhoP-regulated promoters (e.g. the hemL and phoP promoters of E. coli) contain several putative RNA polymerase binding sites located at different positions and belonging to different classes, suggesting that such promoters may be regulated by additional signals and/or transcription factors [6]. The RNA polymerase site feature was evaluated using 721 RNA polymerase sites from RegulonDB as positive examples and 7210 random sequences as negative examples. We obtained an 82% sensitivity and 95% specificity for detecting RNA polymerase sites. These values provide a false discovery rate <0.001 and a correlation coefficient of 82%. In addition, we selected 34 examples of RNA polymerase sites reported to be of class II, which all differ from the typical class I promoter by exhibiting a degenerate -35 sequence motif [6,22,32], and obtained 74% sensitivity and 95% specificity. Binding sites for other transcription factors Certain promoters harbor binding sites for more than one transcription factor. This could be because transcription requires the concerted action of such proteins, or because the promoter is independently activated by individual transcription factors, each responding to a distinct signal. We analyzed the intergenic regions of the E. coli and Salmonella genomes for the presence of binding sites for 54 transcription factors [30]. We then investigated the co-occurrence of 24 sites with the binding site of the PhoP protein in an effort to uncover different types of network motifs involving PhoP-regulated promoters. For example, the Salmonella pmrD, ugd and yrbL promoters and the E. coli yrbL promoter harbor PhoP- and PmrA-binding sites, consistent with the experimentally-verified regulation by both the PhoP and PmrA proteins that can be described by the bi-fan network motif [4,33] (Fig. (Fig.4E).4E By considering the presence of binding sites for multiple transcription factors, it is possible to generate hypotheses about potential network motifs. For example, the promoters of the PhoP-activated gadA, dps, hdeA, yhiE and yhiW genes of E. coli also have binding sites for the regulatory proteins YhiX and YhiE [4], raising the possibility that some of these genes might be regulated by feedforward loops where both the PhoP protein and either the YhiW or the YhiE proteins would bind to the same promoter to activate transcription. This notion was experimentally verified [4], validating our prediction. Evaluating the effect of distinct cis-regulatory features within a network motif Gene expression is often measured by binary assays that evaluate differentials between wild-type and mutant strains (e.g., typical microarrays). These experiments always help to differentiate activated from repressed genes, and sometimes very low from very highly expressed genes. However, these approaches often conceal quantitative differences between true expressed genes. We hypothesize that distinct promoter features may affect gene expression even in similarly arranged network motifs. To test this notion, we compared the gene expression patterns of wild-type Salmonella harboring plasmids with a transcriptional fusion between a promoterless gfp gene to different PhoP-activated promoters (Fig. (Fig.66
We found that promoters that differ in the orientation of the PhoP binding site and are arranged in a similar network motif such as slyB and pagC produce a complete different patterns of expression (Fig. (Fig.4A,4A We also realized that the expression patterns differ in other types of network motifs such as the bi-fan. The Salmonella pmrD and ugd promoters harbour experimentally validated PhoP- and PmrA-boxes [10,34] (Fig. (Fig.4E),4E Conclusion We demonstrated that a transcription factor could mediate differential expression of genes described by the same network motif. This is because of the functional significance of variability in sequence, location and topology that exists among promoters that are co-regulated by a given transcription factor. We developed methods that encode and combine these promoter features, which allows matching of cis-observations to multiple models for a given promoter feature, into flexible databases constituting annotations of genome regulatory regions. These annotations cannot be uncovered by simpler sequence analysis approaches (Fig. (Fig.7).7
Global transcriptional regulators control multiple promoters by a variety of network motifs [27]. This is clearly the case for the regulatory protein PhoP (Fig. (Fig.1).1 Materials and methods Our method consists of three phases: first, encoding the available information into preliminary model-based features, which includes identifying cis-features from DNA sequences and information from available databases; performing initial modeling of each individual feature, allowing the process of multiple occurrences of a feature and using relaxed thresholds and permitting missing values. A model-based feature is generated by the identification of a feature in a subset of observations (F) in the dataset, based on measuring the degree of match (Q) between an observation and a model, or a family of models (M = {Mα}), at some degree (α) defined in a unit-interval scale (i.e., fuzzy values, Q(F, Mα)) [35,36]. Second, grouping the results into subsets, thus, decomposing the preliminary models into a family of models or building blocks by using fuzzy clustering (see Additional file 1). Third, composing the building blocks by either combining the same or different types of features by using fuzzy logic expressions (see Additional file 1). And fourth, describing new promoters using the resulting models. Network motifs In theory, the term "network motifs" is related to a statistical significant subgraph; however, in practice, they are treated as an over represented subgraph [37,38]. For example, a motif termed "single input motif " of three/four nodes in the E. coli (e.g., mfinder1.2 p-value < 34.7+-8.5) or Saccharomyces cerevisiae network [39] is not recognized as significant, while the only motif that exceeds the standard threshold is the "feed forward motif". Activated/repressed We modeled PhoP-regulated promoters as activated or repressed based on examples reported in the RegulonDB database [30]. (1) We separately grouped activated and repressed promoters, and plotted histograms for each group corresponding to the distances between transcription factor binding sites and the transcription initiation (+1) site. (2) We distinguished two non-disjoint distributions in each group and built models for these distances by fitting histograms with fuzzy membership functions [15] (Fig. 3A, B Binding site patterns and orientation (1) We built an initial model for the PhoP binding site by learning a position weight matrix [28] (E-value < 10E-12) based on the upstream sequences of genes corresponding to the training set of the E. coli and Salmonella genomes (Table S1, Additional file 1). (2) We searched the intergenic regions of the genes in both orientations, using low thresholds corresponding to two standard deviations below the mean score obtained with the initial model [40]. Multiple PhoP binding site candidates were allowed in a given promoter operator region. (3) After transforming nucleotides into dummy variables [41], we grouped sequences matching the PhoP position weight matrix using the fuzzy C-means clustering method with the Xie-Beni validity index (see Additional file 1) to estimate the number of clusters [13,42]. (4) We built models for these clusters using position weight matrices (E-value < 10E-22) and searched the E. coli and Salmonella genomes to characterize each gene according to its similarity to each model as a fuzzy partition (Fig. (Fig.22 Riboswitch site patterns (1) We employed upstream regions of PhoP regulated genes to create conserved sequence aligments by comparisons against representative proteobacterial genomes. We used WU BLAST 2.0 http://blast.wustl.edu[43] with a word hit of eight, and using default parameters otherwise. (2) We selected alignments with an E-value ≤ 0.00001 and a length ≥ 50 nt; and divided alignments longer than 300 bp into windows of 300 bp with 50 bp of overlap. (3) These windows fed the programs eQRNA and RNAz following the protocol described in [44] using a window size of 200 nucleotides and a window slide increment of 50 nucleotides. QRNA analysis was performed with eQRNA version 2.0.3c. (ftp://selab.janelia.org/pub/software/qrna/). (3.1) We classified the alignment as RNA, coding, or other, according to the Bayesian posterior probability of each model. RNAz was used with its version 0.1.1 http://www.tbi.univie.ac.at/~wash/RNAz. We only considered overlapping eQRNA and RNAz predictions for the upstream regions of PhoP regulated genes as candidates for small non-coding RNA or riboswitches. (4) We encoded the conservation identity of the segments and their distance to the translation start site of the closest gene as fuzzy sets; and aggregated them using fuzzy expressions (see Additional file 1). (5) All fuzzy expressions of a single gene were combined using the Maximum T-conorm (see Additional file 1). RNA polymerase sites (1) We gathered sigma 70 class I and class II promoters [32,45] from the RegulonDB database and [46]. Then, we built models of the RNA polymerase site using a neuro-fuzzy method (see HPAM in http://gps-tools2.wustl.edu[47]), and used the resulting models to perform genome-wide descriptions of the intergenic regions of the E. coli and Salmonella genomes with a false discovery rate <0.001 (see Promoter search in http://gps-tools2.wustl.edu). (2) We used an intelligent parser to differentiate class I and class II promoters that evaluate the quality of the -35 motif [22,32], based on fuzzy logic (see Additional file 1) and genetic algorithms techniques (see MOSS in gps-tools2.wustl.edu [48]). (3) To characterize the distance relationship between transcription factors binding sites and RNA polymerase binding sites, we built models of such distances from the examples reported in the RegulonDB database. (3.1) We modeled activated and repressed promoters (see below Activated or repressed feature). (3.2) We re-built histograms for each group of distances (i.e. activated and repressed), distinguishing three overlapping distributions for each of them.(3.3) We built models for distances by fitting their distributions into models based on fuzzy membership functions [15] (see Additional file 1), which were termed close, medium and remote distances for each set of activated and repressed genes (Fig. (Fig.3C).3C This process allowed us to retrieve the most representative RNA polymerase binding site candidates for each promoter region relative to the PhoP binding site (e.g., best class II RNA polymerase site, which is located close to the PhoP box in an activated promoter), which were arrayed and constituted the value of the RNA polymerase site feature in Fig. Fig.2.2 Binding sites for other transcription factors We developed models for different transcription factor binding sites from the RegulonDB database as follows: (1) We built position weight matrices for each transcription factor using the Consensus/Patser program, choosing the best final matrix for motif lengths between 14–30 bps if the corresponding length had not been previously specified (see "Consensus matrices" in http://gps-tools2.wustl.edu). We accounted for the motif symmetry (e.g., asymmetric, direct, inverted [45]) if available (see "Search known transcription factor motifs" in http://gps-tools2.wustl.edu). (2) We searched the intergenic regions of the E. coli and Salmonella genomes with these models, using the correlation coefficient measure (see Additional file 1) and additional 772 promoters from the RegulonDB database [30] to establish a threshold (average E-value < 10E-10) for each matrix [50] (see "Thresholded consensus" in http://gps-tools2.wustl.edu). (3) We accounted for the distances between distinct transcription factors binding sites occurring in the same promoter region (e.g., the distance between the CRP and FIS sites in the proP promoter [51]) in promoters reported in RegulonDB database and built a histogram with the obtained results (Fig. (Fig.3D).3D Dataset We initially used the intergenic regions of E. coli and Salmonella operons from -800 to +50 because > 5% are larger than 800 bp in bacterial genomes (as described in the RegulonDB database or generously provided by H. Salgado) [49]; however, predictions have been performed in whole coding and non coding regions (see http://gps-tools2.wustl.edu). The promoter and transcription factor information was taken from RegulonDB database. We compiled from the literature and our own lab information (Table S1, Additional file 1) genes whose expression (using microarrays) differed statistically between wild-type and phoP E. coli strains experiencing inducing conditions for the PhoP/PhoQ regulatory system [4], as well as a list of genes known/assumed to be PhoP regulated [52]. However, this information did not explicitly indicate whether these genes were regulated directly or indirectly by the PhoP protein. The learned features were used to make genome-wide predictions in the E. coli and Salmonella genomes. Programming resources The scripts and programs used in this work, some of which are accessible from http://gps-tools2.wustl.edu web site, were based on Perl, Matlab r2006a and C++ interpreters/languages, and the visualization routines were performed on Spotfire DecisionSite software 8.2. Data and predictions for E. coli and Salmonella genomes are available at supplemental table S1 in Additional file 1 and at http://gps-tools2.wustl.edu. Bacterial strains, plasmids and growth conditions Bacterial strains and plasmids used in this study are listed in Table S2, Additional file 1. Salmonella enterica serovar Typhimurium strains used in this study are derived from strain 14028s. Bacteria were grown at 37°C in Luria-Bertani broth (LB) [53] or N-minimal medium pH 7.7 [54] supplemented with 0.1% Casamino Acids, 38 mM glycerol, MgCl2. Kanamycin was used at 25 μg/ml. Constructions of GFP reporter plasmids Promoter regions (i.e. the intergenic region between two ORFs) were amplified using PCR. A list of the promoter-specific primers used in the PCR reactions is shown in Table S3, Additional file 1. The PCR fragment was digested with BamHI and XhoI, purified, then introduced to the cloning site of pMS201 (GFP reporter vector plasmid, a gift from Alon, U. [55]). Sequences of promoter region were verified by nucleotide sequencing. Measurements of promoter activity and growth kinetics for GFP reporter strains Promoter activity and growth kinetics of wild-type Salmonella strain harboring GFP reporter plasmid was measured in parallel using automated microplate reader (VICTOR3, Perkin Elmer) [55]. Overnight cultures of strains in N-minimal medium with 10 mM MgCl2 and 25 μg/ml of kanamycin were washed with the same medium without MgCl2 then diluted (1:100) to 96-well plate (Packard) containing 150 μl of N-minimal media supplemented 50 μM MgCl2. After overlaying the wells with 50 μl of mineral oil (Sigma) to prevent evaporation of media, the plate was inserted in the VICTOR3 machine pre-warmed to 37°C. The fluorescence and optical density (600 nm) of cells were recorded with shaking of the plate (1 min with 0.1 mm diameter), and this protocol was repeated every 6 min for 99 times. The background fluorescence was measured using a strain carrying empty vector and subtracted from the test values. Each experiment was conducted independently twice, and a representative is shown in the figures. Data preprocessing The raw GFP and OD signals were used to calculate the promoter activity as [dGi(t)/dt]/ODi(t). The activity signal was then smoothed by a shape-preserving interpolant (Piecewise Cubic Hermite Interpolating Polynomial, Matlab r2006a) fitting algorithm that finds values of an underlying interpolating function at intermediate points that are not described in the experimental assays. Then, we applied a polynomial fit (sixth order, Matlab r2006a) on each expression signal. This smoothing procedure captures the dynamics well, while removing the noise inherent in the differentiation of noisy signals. Competing interests The authors declare that they have no competing interests. Authors' contributions OH and IZ designed and implemented the methods and wrote the manuscript; CV designed and implemented the riboswitch identification methods; RRZ coded the perl scripts and the web page; DS performed the experimental validation using GFP technology; HH provided advice on the project and revised the manuscript; EAG supported the project and drafted the manuscript. Additional File 1 Supplemental tables. Table S1 provide the features describing PhoP regulated promoters and raw data used to build them. Table S2 details the bacterial strains and plasmids used in this study, and Table S3 the primers used to construct the promoters in GFP reporter plasmids. Click here for file(739K, pdf) Acknowledgements We thank U. Alon (Weizmann Institute of Science, Israel) for plasmid pMS201, and Elena Rivas (Janelia Farm Research Campus, Howard Hughes Medical Institute) for providing the computational tools to identify riboswitches. This research was supported in part by the Spanish Ministry of Science and Technology under project TIN2006-12879 and by Consejería de Innovacion, Investigación y Ciencia de la de la Junta de Andalucía under project TIC02788. E.A.G. is an Investigator of the Howard Hughes Medical Institute. This article has been published as part of BMC Bioinformatics Volume 10 Supplement 4, 2009: Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2008. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/10?issue=S4. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Cell. 2004 Apr 16; 117(2):185-98.
[Cell. 2004]Genome Res. 2004 Jan; 14(1):99-108.
[Genome Res. 2004]Proc Natl Acad Sci U S A. 2004 Dec 7; 101(49):17162-7.
[Proc Natl Acad Sci U S A. 2004]Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2862-7.
[Proc Natl Acad Sci U S A. 2005]Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2862-7.
[Proc Natl Acad Sci U S A. 2005]J Bacteriol. 1996 Sep; 178(17):5092-9.
[J Bacteriol. 1996]J Bacteriol. 1992 Jan; 174(2):486-91.
[J Bacteriol. 1992]J Bacteriol. 2003 Jul; 185(13):3696-702.
[J Bacteriol. 2003]Microbiology. 2005 Dec; 151(Pt 12):3979-87.
[Microbiology. 2005]Genome Biol. 2002 Oct 10; 3(11):RESEARCH0059.
[Genome Biol. 2002]Nucleic Acids Res. 2001 Feb 1; 29(3):774-82.
[Nucleic Acids Res. 2001]Microbiol Rev. 1991 Sep; 55(3):371-94.
[Microbiol Rev. 1991]J Bacteriol. 2001 Mar; 183(6):1835-42.
[J Bacteriol. 2001]Mol Microbiol. 1999 May; 32(3):629-42.
[Mol Microbiol. 1999]Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2862-7.
[Proc Natl Acad Sci U S A. 2005]J Bacteriol. 2003 Jul; 185(13):3696-702.
[J Bacteriol. 2003]Proc Natl Acad Sci U S A. 2006 Sep 5; 103(36):13503-8.
[Proc Natl Acad Sci U S A. 2006]J Mol Biol. 1991 Mar 5; 218(1):45-54.
[J Mol Biol. 1991]Curr Opin Microbiol. 2004 Apr; 7(2):102-8.
[Curr Opin Microbiol. 2004]Trends Biotechnol. 2002 Oct; 20(10):407-10; discussion 410.
[Trends Biotechnol. 2002]Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2862-7.
[Proc Natl Acad Sci U S A. 2005]J Bacteriol. 2003 Nov; 185(21):6287-94.
[J Bacteriol. 2003]J Biol Chem. 2005 Feb 11; 280(6):4089-94.
[J Biol Chem. 2005]Proc Int Conf Intell Syst Mol Biol. 1995; 3():21-9.
[Proc Int Conf Intell Syst Mol Biol. 1995]Curr Opin Microbiol. 2003 Oct; 6(5):482-9.
[Curr Opin Microbiol. 2003]Bioinformatics. 2000 Jan; 16(1):16-23.
[Bioinformatics. 2000]Bioinformatics. 1999 Jul-Aug; 15(7-8):563-77.
[Bioinformatics. 1999]Nat Biotechnol. 2005 Jan; 23(1):137-44.
[Nat Biotechnol. 2005]Bioinformatics. 2000 Jan; 16(1):16-23.
[Bioinformatics. 2000]Bioinformatics. 1999 Jul-Aug; 15(7-8):563-77.
[Bioinformatics. 1999]Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2862-7.
[Proc Natl Acad Sci U S A. 2005]J Bacteriol. 2003 Jul; 185(13):3696-702.
[J Bacteriol. 2003]Proc Natl Acad Sci U S A. 2003 Apr 15; 100(8):4706-11.
[Proc Natl Acad Sci U S A. 2003]Cell. 2006 Apr 7; 125(1):71-84.
[Cell. 2006]Curr Opin Microbiol. 2004 Apr; 7(2):102-8.
[Curr Opin Microbiol. 2004]J Bacteriol. 2003 Jul; 185(13):3696-702.
[J Bacteriol. 2003]Curr Opin Microbiol. 2004 Apr; 7(2):102-8.
[Curr Opin Microbiol. 2004]J Bacteriol. 1993 May; 175(9):2483-9.
[J Bacteriol. 1993]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D303-6.
[Nucleic Acids Res. 2004]Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2862-7.
[Proc Natl Acad Sci U S A. 2005]Genes Dev. 2004 Sep 15; 18(18):2302-13.
[Genes Dev. 2004]Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2862-7.
[Proc Natl Acad Sci U S A. 2005]Proc Natl Acad Sci U S A. 2003 Apr 15; 100(8):4706-11.
[Proc Natl Acad Sci U S A. 2003]Mol Microbiol. 2003 Jan; 47(2):335-44.
[Mol Microbiol. 2003]Curr Opin Microbiol. 2003 Oct; 6(5):482-9.
[Curr Opin Microbiol. 2003]Ann N Y Acad Sci. 2002 Dec; 980():65-82.
[Ann N Y Acad Sci. 2002]Nat Genet. 2002 May; 31(1):64-8.
[Nat Genet. 2002]Science. 2002 Oct 25; 298(5594):824-7.
[Science. 2002]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D303-6.
[Nucleic Acids Res. 2004]Bioinformatics. 1999 Jul-Aug; 15(7-8):563-77.
[Bioinformatics. 1999]J Mol Biol. 1998 Nov 27; 284(2):241-54.
[J Mol Biol. 1998]Nucleic Acids Res. 2003 Jul 1; 31(13):3795-8.
[Nucleic Acids Res. 2003]Mol Microbiol. 2007 Dec; 66(5):1080-91.
[Mol Microbiol. 2007]J Bacteriol. 1993 May; 175(9):2483-9.
[J Bacteriol. 1993]Nucleic Acids Res. 2001 Jan 1; 29(1):72-4.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 1987 Mar 11; 15(5):2343-61.
[Nucleic Acids Res. 1987]Curr Opin Microbiol. 2004 Apr; 7(2):102-8.
[Curr Opin Microbiol. 2004]Nucleic Acids Res. 2001 Jan 1; 29(1):72-4.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D303-6.
[Nucleic Acids Res. 2004]Genome Biol. 2002; 3(3):RESEARCH0013.
[Genome Biol. 2002]J Mol Biol. 2002 Feb 22; 316(3):517-29.
[J Mol Biol. 2002]Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2862-7.
[Proc Natl Acad Sci U S A. 2005]Bioinformatics. 2005 Nov 15; 21(22):4073-83.
[Bioinformatics. 2005]J Biol Chem. 1991 Jan 15; 266(2):824-9.
[J Biol Chem. 1991]Proc Natl Acad Sci U S A. 2003 Oct 14; 100(21):11980-5.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 2003 Oct 14; 100(21):11980-5.
[Proc Natl Acad Sci U S A. 2003]