Format

Send to

Choose Destination
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8:S1. doi: 10.1186/1471-2105-11-S8-S1.

Efficient motif finding algorithms for large-alphabet inputs.

Author information

1
Department of Computer Science, Rutgers University, Piscataway, NJ 08854, USA. pkuksa@cs.rutgers.edu

Abstract

BACKGROUND:

We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings.

RESULTS:

The proposed algorithm (1) improves search efficiency compared to existing algorithms, and (2) scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA) we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families.

CONCLUSIONS:

Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.

PMID:
21034426
PMCID:
PMC2966288
DOI:
10.1186/1471-2105-11-S8-S1
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center