Format

Send to

Choose Destination

Modeling transcription factor binding sites with Gibbs Sampling and Minimum Description Length encoding.

Author information

1
jschug@cbil.humgen.upenn.edu

Abstract

Transcription factors, proteins required for the regulation of gene expression, recognize and bind short stretches of DNA on the order of 4 to 10 bases in length. In general, each factor recognizes a family of "similar" sequences rather than a single unique sequence. Ultimately, the transcriptional state of a gene is determined by the cooperative interaction of several bound factors. We have developed a method using Gibbs Sampling and the Minimum Description Length principle for automatically and reliably creating weight matrix models of binding sites from a database (TRANSFAC) of known binding site sequences. Determining the relationship between sequence and binding affinity for a particular factor is an important first step in predicting whether a given uncharacterized sequence is part of a promoter site or other control region. Here we describe the foundation for the methods we will use to develop weight matrix models for transcription factor binding sites.

PMID:
9322048
[Indexed for MEDLINE]

Supplemental Content

Loading ...
Support Center