Approximate matching of network expressions with spacers

J Comput Biol. 1996 Spring;3(1):33-51. doi: 10.1089/cmb.1996.3.33.

Abstract

Two algorithmic results are presented that are pertinent to the matching of patterns typically used by biologists to describe regions of macromolecular sequences that encode a given function. The first result is a threshold-sensitive algorithm for approximately matching both network and regular expressions. Network expressions are regular expressions that can be composed only from union and concatenation operators. Kleene closure (i.e., unbounded repetition) is not permitted. The algorithm is threshold-sensitive in that its performance depends on the threshold, k, of the number of differences allowed in an approximate match. This result generalizes the O(kn) expected-time algorithm of Ukkonen for approximately matching keywords. The second result concerns the problem of matching a pattern that is a network expression whose elements are approximate matches to network or regular expressions interspersed with specifiable distance ranges. For this class of patterns, it is shown how to determine a backtracking procedure whose order of evaluation is optimal in the sense that its expected time is minimal over all such procedures.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Computer Simulation
  • Models, Genetic*
  • Molecular Sequence Data
  • Proteins / chemistry
  • Sequence Alignment

Substances

  • Proteins