Next: 1.3 Where can we
Up: 1 Introduction
Previous: 1.1 What does blastall
Contents
Index
1.2 How does blastall find the alignment?
BLAST finds the optimal alignment by using the 'word matching algorithm',
in which BLAST does the search in several distinctive phases:
- generating overlapping words from the input query;
- scanning the database for word matches (hits); and
- extending word hits to produce (local) alignments through three steps
of extension
During the first phase, BLAST breaks the input query into short overlapping
segments, or 'words', and stores them in a hash table. BLAST takes
those query words and scans the target database for initial matches
in the second phase. The nucleotide BLAST algorithm looks for any
single exact word match (with the exception of discontiguous mode
in megablast). The protein BLAST algorithm uses a scoring matrix and
a scoring threshold (neighborhood score threshold) to identify matches.
Protein BLAST algorithm also requires two word hits within a certain
distance in order to proceed to the next extension phase. In the third
phase, initial matches or word hits are used as seeds of alignment
extension. This phase is divided into three separate steps. The first
step is the un-gapped extension, in which BLAST extends the initial
word hits in both directions to generate initial alignments without
allowing the introduction of gaps. The second step is the gapped extension,
in which BLAST attempts to extend the initial un-gapped alignments
further by allowing gapping in the new extension step. BLAST only
keeps track of the start and end positions of a given alignment without
saving the alignment details. In the last step, BLAST performs a final
gapped extension with trace-back to generate the actual alignments.
It is important to note that limits on the number of alignments (-b),
number of descriptions (-v), and e-value cut-off (-e) are applied
at each of the three extension steps. Very stringent settings may
cause BLAST to miss some good hits since the initial un-gapped HSPs
may fall below cutoff and not be carried over to the gapped extension
step.
Next: 1.3 Where can we
Up: 1 Introduction
Previous: 1.1 What does blastall
Contents
Index
Tao Tao
2006-12-29