Display Settings:


Send to:

Choose Destination
See comment in PubMed Commons below

Regulatory element detection using a probabilistic segmentation model.

Author information

  • 1Swammerdam Institute for Life Sciences and Amsterdam Center for Computational Science, University of Amsterdam, The Netherlands. bussemaker@bio.uva.nl


The availability of genome-wide mRNA expression data for organisms whose genome is fully sequenced provides a unique data set from which to decipher how transcription is regulated by the upstream control region of a gene. A new algorithm is presented which decomposes DNA sequence into the most probable "dictionary" of motifs or words. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter words of various length. This eliminates the need for a separate set of reference data to define probabilities, and genome-wide applications are therefore possible. For the 6,000 upstream regulatory regions in the yeast genome, the 500 strongest motifs from a dictionary of size 1,200 match at a significance level of 15 standard deviations to a database of cis-regulatory elements. Analysis of sets of genes such as those up-regulated during sporulation reveals many new putative regulatory sites in addition to identifying previously known sites.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Loading ...
    Write to the Help Desk