Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
Mamm Genome. 2002 Sep;13(9):510-4.

Identification of transcription factor binding sites in the human genome sequence.

Author information

  • 1Informatics Research, Celera Corporation, Rockville, Maryland 20850, USA. samuel.levy@celera.com

Abstract

The identification of transcription factor binding sites (TFBS) is an important initial step in determining the DNA signals that regulate transcription of the genome. We tested the performance of three distinct computational methods for the identification of TFBS applied to the human genome sequence, as judged by their ability to recover the location of experimentally determined, and uniquely mapped, TFBS taken from the TRANSFAC database. These identification methods all attempt to filter the quantity of TFBS identified by aligning positional weight matrices that describe the binding site and employ either (i) a P-value threshold for accepting a site, (ii) an over-representation measure of neighboring sites, or (iii) conservation with the mouse genome and application of P-value thresholds. The results show that the best recognition of TFBS is achieved by combining the identification of TFBS in regions of human-mouse conservation and also by applying a high stringency P-value to the TFBS identified in non-coding regions that are not conserved. Additionally, we find that only half of the 481 experimentally mapped sites can be found in sequence regions conserved with mouse, but the predictive power of the binding site identification method is up to threefold higher in the conserved regions.

PMID:
12370781
[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for Springer
    Loading ...
    Write to the Help Desk