next up previous contents index
Next: 1.3 Where can we Up: 1 Introduction Previous: 1.1 What does blastall   Contents   Index


1.2 How does blastall find the alignment?

BLAST finds the optimal alignment by using the 'word matching algorithm', in which BLAST does the search in several distinctive phases:

  1. generating overlapping words from the input query;
  2. scanning the database for word matches (hits); and
  3. extending word hits to produce (local) alignments through three steps of extension
During the first phase, BLAST breaks the input query into short overlapping segments, or 'words', and stores them in a hash table. BLAST takes those query words and scans the target database for initial matches in the second phase. The nucleotide BLAST algorithm looks for any single exact word match (with the exception of discontiguous mode in megablast). The protein BLAST algorithm uses a scoring matrix and a scoring threshold (neighborhood score threshold) to identify matches. Protein BLAST algorithm also requires two word hits within a certain distance in order to proceed to the next extension phase. In the third phase, initial matches or word hits are used as seeds of alignment extension. This phase is divided into three separate steps. The first step is the un-gapped extension, in which BLAST extends the initial word hits in both directions to generate initial alignments without allowing the introduction of gaps. The second step is the gapped extension, in which BLAST attempts to extend the initial un-gapped alignments further by allowing gapping in the new extension step. BLAST only keeps track of the start and end positions of a given alignment without saving the alignment details. In the last step, BLAST performs a final gapped extension with trace-back to generate the actual alignments.

It is important to note that limits on the number of alignments (-b), number of descriptions (-v), and e-value cut-off (-e) are applied at each of the three extension steps. Very stringent settings may cause BLAST to miss some good hits since the initial un-gapped HSPs may fall below cutoff and not be carried over to the gapped extension step.


next up previous contents index
Next: 1.3 Where can we Up: 1 Introduction Previous: 1.1 What does blastall   Contents   Index
Tao Tao 2006-12-29