Send to

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2004 Mar 1;20(4):569-75. Epub 2004 Jan 22.

Enrichment of transcriptional regulatory sites in non-coding genomic region.

Author information

The State Key Laboratory of Pharmaceutical Biotechnology, School of Life Science, Nanjing University, Nanjing 210093, China.



Over-represented k-mers in non-coding genomic regions often lead to identification of potential transcriptional regulatory sites (TRS). This phenomenon has been employed by many algorithms to predict TRS in silico. Yet, the improvement of these algorithms should be based on deeper understanding of the enrichment feature. To obtain a general distributional profile of TRS in different regions of genomes as well as in different genomes, we here performed a systematic analysis on the over-representation of TRS in intergenic regions and gene upstream regions of yeasts and viral genomes, and the distributional pattern of TRS in intergenic and intron regions of the Drosophila genome. We also explored the way to evaluate the accuracy of TRS consensus sequences by measuring their enrichment.


To measure enrichment, a statistical background model was introduced by comparing TRS frequency in certain regions of genome to either the frequency in the whole genome or the frequency in exon region. This model was applied to different classes of non-coding genomic regions in four genomes. Most of the TRS were observed to be over-represented in the intergenic regions of the Saccharomyces cerevisiae, Schizosaccharomyces pombe and Epstein-Barr virus (EBV) genomes. The enrichment of S.cerevisiae TRS in the 600 bp upstream region of genes was also significant. In Drosophila genome, TRS did not show enrichment in intergenic and intron regions when TRS frequency in the whole genome was taken as background, as we did in other genomes. However, when we took TRS frequency in exon region as background, over 70% TRS are over-represented in those two classes of non-coding regions. This fact indicates the existence of transcriptional regulatory signals in introns. The analysis of some S.cerevisiae TRS, which have inconsistent consensus sequences with different levels of enrichment in intergenic region, suggests the possibility of evaluating the accuracy of experimentally determined TRS by measuring their enrichment in non-coding genomic regions.

[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Silverchair Information Systems
    Loading ...
    Support Center