Display Settings:


Send to:

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2003;19 Suppl 1:i169-76.

Identification of functional clusters of transcription factor binding motifs in genome sequences: the MSCAN algorithm.

Author information

  • 1Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0114, USA.



The identification of regulatory control regions within genomes is a major challenge. Studies have demonstrated that regulating regions can be described as locally dense clusters or modules of cis-acting transcription factor binding sites (TFBS). For well-described biological contexts, it is possible to train predictive algorithms to discern novel modules in genome sequences. However, utility of module detection methods has been severely limited by insufficient training data. For only a few tissues can one obtain sufficient numbers of literature-derived regulatory modules.


We present a novel method, MSCAN, that circumvents the training data problem by measuring the statistical significance of any non-overlapping combination of TFBS in a window. Given a set of transcription factor binding profiles, a significance threshold, and a genomic sequence, MSCAN returns putative regulatory regions. We assess performance on two curated collections of regulatory regions; one each for tissue-specific expression in liver and skeletal muscle cells. The efficiency of MSCAN allows for predictive screens of entire genomes.

[PubMed - indexed for MEDLINE]
Free full text
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Icon for HighWire
    Loading ...
    Write to the Help Desk