Format

Send to:

Choose Destination
See comment in PubMed Commons below
Nucleic Acids Res. 2005 Mar 10;33(5):1445-53. Print 2005.

NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence.

Author information

  • 1Wellcome Trust Sanger Institute, Hinxton Cambridge, CB10 1SA, UK. td2@sanger.ac.uk

Abstract

NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites.

PMID:
15760844
[PubMed - indexed for MEDLINE]
PMCID:
PMC1064142
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for HighWire Icon for PubMed Central
    Loading ...
    Write to the Help Desk