next up previous contents index
Next: 6.6 Converting PSSMS Up: 6 Combinations of Parameters Previous: 6.4 Filter Strings: Functions   Contents   Index


6.5 MEGABLAST Features

MEGABLAST uses a greedy algorithm ([8]) to perform nucleotide sequence searches. This program is optimized for aligning sequences that differ slightly as a result of sequencing errors or naturally occurring variations. When a larger word size is used (see explanation below), it is up to 10 times faster than the traditional blastn program. MEGABLAST is also able to efficiently handle much longer DNA sequences than the traditional BLAST algorithm.

Word size is roughly the minimal length of an identical match an alignment must contain if it is to be found by the algorithm. MEGABLAST is most efficient with word sizes 16 and larger. For 'MEGABLAST=on', the word size default is 28, with a lower limit of 12. If the value W of the word size is divisible by 4, it guarantees that all perfect matches of length W + 3 will be found and extended by the MEGABLAST search algorithm. However, perfect matches of length as low as W might also be found, although this is not guaranteed. Any value of W not divisible by 4 is equivalent to the nearest value divisible by 4 (with 4i+2 equivalent to 4i).

By default, non-affine gapping parameters are assumed. This means that the gap opening penalty G is 0, and gap extension penalty E can be computed from the match reward r and mismatch penalty q by the formula E = r / 2 - q. The non-affine version of MEGABLAST requires significantly less memory and is also significantly faster. A limited affine gapping parameter can also be used, preferrably with larger word sizes 6.13.1. Non-affine gapping parameters tend to yield alignments with more gaps, but the gap lengths are shorter.

In traditional BLAST, the X-dropoff value provides a cutoff threshold for the extension algorithm during tree exploration. When the score of a given branch drops below the current best score minus the X-dropoff, the exploration of this branch stops. However, the actual values of the X-dropoff for MEGABLAST and for traditional nucleotide BLAST programs are not necessarily compatible, i.e. with the same word size, match, mismatch, and gapping penalties as well as the same X-dropoff, the two programs might produce different results. This can be remedied by changing the X-dropoff value for one of the programs.

Note: This subsection is adapted from http://www.ncbi.nlm.nih.gov/blast/megablast.html.


next up previous contents index
Next: 6.6 Converting PSSMS Up: 6 Combinations of Parameters Previous: 6.4 Filter Strings: Functions   Contents   Index
Tao Tao 2007-08-03