• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 1, 2004; 32(Web Server issue): W318–W320.
PMCID: PMC441502

PredictRegulon: a web server for the prediction of the regulatory protein binding sites and operons in prokaryote genomes


An interactive web server is developed for predicting the potential binding sites and its target operons for a given regulatory protein in prokaryotic genomes. The program allows users to submit known or experimentally determined binding sites of a regulatory protein as ungapped multiple sequence alignments. It analyses the upstream regions of all genes in a user-selected prokaryote genome and returns the potential binding sites along with the downstream co-regulated genes (operons). The known binding sites of a regulatory protein can also be used to identify its orthologue binding sites in phylogeneticaly related genomes where the trans-acting regulator protein and cognate cis-acting DNA sequences could be conserved. PredictRegulon can be freely accessed from a link on our world wide web server: http://www.cdfd.org.in/predictregulon/.


With over 100 bacterial genomes sequenced, a key challenge of post-genomic research is to dissect the complex transcription regulatory network which controls the metabolic and physiological process of a cell. A first step towards this goal is to identify the genes within a genome that are controlled by a specific transcription regulatory protein. This paper describes a web server tool—PredictRegulon—for genome-wide prediction of potential binding sites and target operons of a regulatory protein for which few experimentally identified binding sites are known. This technique could utilize the available experimental data on binding sites of transcription regulatory proteins from various bacterial species (13) for identification of regulons in phylogenetically related species.


The program, PredictRegulon, first constructs the binding site recognition profile based on ungapped multiple sequence alignment of known binding sites. This profile is calculated using Shannon's positional relative entropy approach (4). The positional relative entropy Qi at position i in a binding site is defined as

An external file that holds a picture, illustration, etc.
Object name is gkh364equ1.gif

where b refers to each of the possible bases (A, T, G, C), fb,i is observed frequency of each base at position i and qb is the frequency of base b in the genome sequence. The contribution of each base to the positional Shannon relative entropy is calculated by multiplying each base frequency by positional relative entropy as follows:

An external file that holds a picture, illustration, etc.
Object name is gkh364equ2.gif

where Wb,i refers to the weighted Shannon relative entropy of the base b (A, T, G, C) at position i. Finally, a 4 × L entropy matrix (L is the length of the binding site) is constructed representing the binding site recognition profile, where each matrix element is the weighted positional Shannon relative entropy of a base.

The profile, encoded as the matrix, is used to scan the upstream sequences of all the genes of the user-selected genome. The entropy score of each site is calculated as the sum of the respective positional nucleotide entropy (Wb,i). A maximally scoring site is selected from the upstream sequence of each gene. The score may represent the strength of interaction between regulatory protein and binding site (5). The lowest score among the input sites is considered as the cut-off score. The sites scoring higher than the the cut-off value are reported as potential binding sites conforming to the consensus profile.

Co-directionally transcribed genes downstream of the predicted binding site were selected as potential co-regulated genes (operons) according to one of the following criteria: (i) co-directionally transcribed orthologous gene pairs conserved in at least three genomes (6); (ii) genes belong to the same cluster of orthologous gene function category and the intergenic distance is <200 bp (7); (iii) the first three letters in gene names are identical (the gene names for all the bacterial species were assigned using the COG annotation); (iv) intergenic distance is <90 bp (8).

This method has two specific requirements: a few experimentally determined regulatory protein binding sites should be available for developing the binding site recognition profile, and the profile should be applicable to the genome where the regulator or its homologue is present. In the absence of any experimental information on the regulatory sites in a given genome one may look up the known regulatory motifs from other related species from one of the four online databases which host the information about known transcription regulatory protein binding sites in prokaryote genomes (13).

A limitation of this approach is that it may predict a few false positive sites as candidates. However, this limitation can be overcome by experimental validations, by either in vitro binding studies with double strand oligonucleotides containing the binding sites (designed based on prediction) and regulatory proteins or real-time PCR analysis of candidate co-regulated genes.


To demonstrate a typical usage of PredictRegulon, we predicted the LexA binding sites and LexA regulon of M.tuberculosis using the LexA binding sites of Bacillus subtilis. LexA regulators from B.subtilis and M.tuberculosis share a high sequence identity (45%) at protein level (data not shown). Table Table11 lists the known LexA binding sites from B. subtilis given as input to the program (2) and Table Table22 shows the output of predicted LexA binding sites in M.tuberculosis. The site column in Table Table22 represents the predicted binding sites of LexA in M.tuberculosis. In a typical output the perfect match to the known binding sites and the downstream genes are highlighted with a yellow background, and the rest with score greater than cut-off is shown with a blue background (colours not shown in the table). Eighteen of these genes (indicated by ‘a’) belonging to the LexA regulon were also observed in data obtained by experimental means by others (912). The rest of the matches are potential novel regulatory sites which could be confirmed experimentaly.

Table 1.
Known LexA binding sites of Bacillus subtilis from the PRODORIC database
Table 2.
Output of PredictRegulon web server (predicted LexA binding sites)

The web output of PredictRegulon also contains the hyperlinked gene-synonym and COG number. A click on the former shows the predicted operon context of the regulatory motif while a click on the latter opens a new page showing a description of this gene in the NCBI Conserved Domain Database, which is in turn linked to Pubmed for published information on this gene. These additional links provides users a simple way to browse and understand the functional/physiological implication of the genes that are part of predicted regulon.


This work is partially supported by the Council of Scientific and Industrial Research (CSIR) NMITLI Grant to A.R. Y.S. and S.K. were recipients of Senior Research Fellowships from CSIR, Govt. of India.


1. Salgado H., Santos-Zavaleta,A., Gama-Castro,S., Millan-Zarate,D., Diaz-Peredo,E., Sanchez-Solano,F., Perez-Rueda,E., Bonavides-Martinez,C. and Collado-Vides,J. (2001) RegulonDB (Version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res., 29, 72–74. [PMC free article] [PubMed]
2. Munch R., Hiller,K., Barg,H., Heldt,D., Linz,S., Wingender,E. and Jahn,D. (2003) PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res., 31, 266–269. [PMC free article] [PubMed]
3. Ishii T., Yoshida,K., Terai,G., Fujita,Y. and Nakai,K. (2001) DBTBS: a database of Bacillus subtilis promoters and transcription factors. Nucleic Acids Res., 29, 278–280. [PMC free article] [PubMed]
4. Shannon C.E. (1948) A mathematical theory of communication. Bell Sys. Tech. J., 379–423 and 623–656.
5. Benos P.V., Bulyk,M.L. and Stormo,G.D. (2002) Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res., 30, 4442–4451. [PMC free article] [PubMed]
6. Ermolaeva M.D., White,O. and Salzberg,S.L. (2001) Prediction of operons in microbial genomes. Nucleic Acids Res., 295, 1216–1221. [PMC free article] [PubMed]
7. Salgado H., Moreno-Hagelsieb,G., Smith,T.F. and Collado-Vides,J. (2000) Operons in Escherichia coli: genomic analyses and predictions. Proc. Natl Acad. Sci., USA, 97, 6652–6657. [PMC free article] [PubMed]
8. Strong M., Mallick P., Pellegrini,M., Thompson,M.J. and Eisenberg,D. (2003) Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach. Genome Biol., 4, R59. [PMC free article] [PubMed]
9. Durbach S.I., Andersen,S.J. and Mizrahi,V. (1997) SOS induction in mycobacteria: analysis of the DNA-binding activity of a LexA-like repressor and its role in DNA damage induction of the recA gene from Mycobacterium smegmatis. Mol. Microbiol., 26, 643–653. [PubMed]
10. Brooks P.C., Movahedzadeh,F. and Davis,E.O. (2001) Identification of some DNA damage-inducible genes of Mycobacterium tuberculosis: apparent lack of correlation with LexA binding. J. Bacteriol., 183, 4459–4467. [PMC free article] [PubMed]
11. Dullaghan E.M., Brooks,P.C. and Davis,E.O. (2002) The role of multiple SOS boxes upstream of the Mycobacterium tuberculosis lexA gene—identification of a novel DNA-damage-inducible gene. Microbiology, 148, 3609–3615. [PubMed]
12. Boshoff H.I., Reed,M.B., Barry,C.E. and Mizrahi,V. (2003) DNAE2 polymerase contributes to in vivo survival and the emergence of drug resistance in Mycobacterium tuberculosis. Cell, 113, 183–193. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...