Format

Send to

Choose Destination
See comment in PubMed Commons below
Int J Bioinform Res Appl. 2014;10(4-5):384-408. doi: 10.1504/IJBRA.2014.062991.

Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps.

Author information

1
Computational Biology Branch, National Center for Biotechnology Information, Bethesda, MD 20894, USA.

Abstract

Some biological sequences contain subsequences of unusual composition; e.g. some proteins contain DNA binding domains, transmembrane regions and charged regions, and some DNA sequences contain repeats. The linear-time Ruzzo-Tompa (RT) algorithm finds subsequences of unusual composition, using a sequence of scores as input and the corresponding 'maximal segments' as output. In principle, permitting gaps in the output subsequences could improve sensitivity. Here, the input of the RT algorithm is generalised to a finite, totally ordered, weighted graph, so the algorithm locates paths of maximal weight through increasing but not necessarily adjacent vertices. By permitting the penalised deletion of unfavourable letters, the generalisation therefore includes gaps. The program RepWords, which finds inexact simple repeats in DNA, exemplifies the general concepts by out-performing a similar extant, ad hoc tool. With minimal programming effort, the generalised Ruzzo-Tompa algorithm could improve the performance of many programs for finding biological subsequences of unusual composition.

KEYWORDS:

DNA sequences; biological sequences; gaps; generalised Ruzzo–Tompa algorithm; optimal subsequences; repeats; unusual composition

PMID:
24989859
PMCID:
PMC4135518
DOI:
10.1504/IJBRA.2014.062991
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Atypon Icon for PubMed Central
    Loading ...
    Support Center