Format

Send to

Choose Destination
BMC Res Notes. 2012 Jul 2;5:340. doi: 10.1186/1756-0500-5-340.

Effect of positional dependence and alignment strategy on modeling transcription factor binding sites.

Author information

1
Department of Computer Science & Engineering, University of Connecticut, Storrs, 06269-2155, USA.

Abstract

BACKGROUND:

Many consensus-based and Position Weight Matrix-based methods for recognizing transcription factor binding sites (TFBS) are not well suited to the variability in the lengths of binding sites. Besides, many methods discard known binding sites while building the model. Moreover, the impact of Information Content (IC) and the positional dependence of nucleotides within an aligned set of TFBS has not been well researched for modeling variable-length binding sites. In this paper, we propose ML-Consensus (Mixed-Length Consensus): a consensus model for variable-length TFBS which does not exclude any reported binding sites.

METHODS:

We consider Pairwise Score (PS) as a measure of positional dependence of nucleotides within an alignment of TFBS. We investigate how the prediction accuracy of ML-Consensus is affected by the incorporation of IC and PS with a particular binding site alignment strategy. We perform cross-validations for datasets of six species from the TRANSFAC public database, and analyze the results using ROC curves and the Wilcoxon matched-pair signed-ranks test.

RESULTS:

We observe that the incorporation of IC and PS in ML-Consensus results in statistically significant improvement in the prediction accuracy of the model. Moreover, the existence of a core region among the known binding sites (of any length) is witnessed by the pairwise coexistence of nucleotides within the core length.

CONCLUSIONS:

These observations suggest the possibility of an efficient multiple sequence alignment algorithm for aligning TFBS, accommodating known binding sites of any length, for optimal (or near-optimal) TFBS prediction. However, designing such an algorithm is a matter of further investigation.

PMID:
22748199
PMCID:
PMC3465234
DOI:
10.1186/1756-0500-5-340
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center