Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
Proc Natl Acad Sci U S A. 2002 Sep 3;99(18):11772-7. Epub 2002 Aug 14.

Identification of the binding sites of regulatory proteins in bacterial genomes.

Author information

  • 1Department of Biochemistry, University of California, 513 Parnassus Avenue, San Francisco, CA 94143, USA. haoli@phy.ucsf.edu

Abstract

We present an algorithm that extracts the binding sites (represented by position-specific weight matrices) for many different transcription factors from the regulatory regions of a genome, without the need for delineating groups of coregulated genes. The algorithm uses the fact that many DNA-binding proteins in bacteria bind to a bipartite motif with two short segments more conserved than the intervening region. It identifies all statistically significant patterns of the form W(1)N(x)W(2), where W(1) and W(2) are two short oligonucleotides separated by x arbitrary bases, and groups them into clusters of similar patterns. These clusters are then used to derive quantitative recognition profiles of putative regulatory proteins. For a given cluster, the algorithm finds the matching sequences plus the flanking regions in the genome and performs a multiple sequence alignment to derive position-specific weight matrices. We have analyzed the Escherichia coli genome with this algorithm and found approximately 1,500 significant patterns, which give rise to approximately 160 distinct position-specific weight matrices. A fraction of these matrices match the binding sites of one-third of the approximately 60 characterized transcription factors with high statistical significance. Many of the remaining matrices are likely to describe binding sites and regulons of uncharacterized transcription factors. The significance of these matrices was evaluated by their specificity, the location of the predicted sites, and the biological functions of the corresponding regulons, allowing us to suggest putative regulatory functions. The algorithm is efficient for analyzing newly sequenced bacterial genomes for which little is known about transcriptional regulation.

PMID:
12181488
[PubMed - indexed for MEDLINE]
PMCID:
PMC129344
Free PMC Article

Images from this publication.See all images (1)Free text

Figure 1
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for HighWire Icon for PubMed Central
    Loading ...
    Write to the Help Desk