Send to:

Choose Destination
See comment in PubMed Commons below

A distance-based block searching algorithm.

Author information

  • 1Atelier de BioInformatique, CPASO URA CNRS 448, Section de Physique et Chimie de l'Institut Curie, Paris.


We present in this paper an algorithm for the multiple comparison of a set of protein sequences. Our approach is that of peptide matching and consists in looking for all the words that occur approximatively in at least q of the sequences in the set, where q is a parameter. Words are compared by using a reference object called a model, that is itself a word over the alphabet of the amino acids, and the comparison between a model and a word is based on w-length words instead of single symbols. This idea is similar to the one used in the Blast program in the case of pairwise comparisons. Two w-length words are considered to be related if an alignment without gaps of the two using a similarity matrix has a score greater than a certain threshold value t. In our case, we say that a k-length word u is an occurrence of a model m of the same length if every w-length subword of u is related to the corresponding subword of m in the sense given above. If a model m has occurrences in at least q of the sequences of the set, m is said to occur in the set. In percentage terms, the value of q may correspond to something as small as 5% of the sequences (search for recurrent words in a set of non homologous proteins) or as high as 70-100% (establishment of a list of all similar words as a first step in a multiple alignment program). The algorithm presented here is an efficient and exact way of looking for all the models, of a fixed length k or of the greatest possible length kmax, that occur in a set of sequences. It can work with any kind of scoring matrix and an extension of the algorithm allows for the introduction of gaps between a model and its occurrences.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Loading ...
    Write to the Help Desk