next up previous contents index
Next: 6.5 MEGABLAST Features Up: 6 Combinations of Parameters Previous: 6.3 Entrez Query Terms   Contents   Index


6.4 Filter Strings: Functions and Alignment Display

In standalone commandline BLAST, the filter parameter (-F) takes a string as input. The function of this parameter is to mask regions of the query sequence which have biased composition and/or unclear biological significance. This in turn allows BLAST to focus more on finding the biologically meaningful matches to the remainder of the query.

The arguments accepted by the '-F' parameter include: T, F, D, L, R, V, S, C, and m. L stands for Low complexity, D for DUST (nucleotide low complexity), R for human repeats, V for Vector, S for SEG (protein low complexity filter), and C for coil. The input m to '-F' stands for masking for the lookup table only, which masks during the lookup stage but allows BLAST to extend through the masked region during the alignment extension.

S (SEG) has other user specifiable values. For example, -F "S 10 1.0 1.5" means to use SEG filter with a window of 10, low cut of 1, and high cut of 1.5.

C (COIL) also has user specifiable values ([3], [6]). For example, -F "C 28 40 32" stands for COIL filter with a window of 22, cutoff of 40, and linker of 32.

To run the SEG and COIL filters together, use: -F "S; C"

To mask the lookup table only without affecting the extension, add m: -F "m S; C"

To mask human repeat sequences use: -F R or -F "m R"

To combine with the low complexity filter, use: -F "m L;R"

To mask vector sequences, use: -F "V"

BLAST URLAPI accepts multiple 'FILTER=value' input. Multiple options for the FILTER parameter need to be specified with multiple 'FILTER=value' pairs. Some of the filter strings available in standalone commandline BLAST are not available to the FILTER parameter of the BLAST URLAPI. One is the coil-coil filter (C) for protein queries, and another, for nucleotide search, is vector masking (V). An unofficial workaround of the latter is to incorportate '...&FILTER=R+-d+UniVec&...' in the URL, which forces BLAST to do 'repeat' masking with the UniVec library.

Some species-specific repeat filters are available and require a special command. For example, to call rodent repeat libraries, we will need to use '...&FILTER=R+-d+rodents.lib&...'. Users can also use the REPEATS parameter to access other species specific repeat libraries, which are given in Table 6.3 below.

Table 6.4 Other Species Specific Repeat Libraries Available
Taxa Name Library Nme Usage in URL
Chlamydomonas reinhardtii repeat_3055 FILTER=R&REPEATS=repeat_3055
Fugu repeat_31032 FILTER=R&REPEATS=repeat_31032
Thalassiosira pseudonana repeat_35128 FILTER=R&REPEATS=repeat_35128
Arabidopsis thaliana repeat_3702 FILTER=R&REPEATS=repeat_3702
Mammalia repeat_40674 FILTER=R&REPEATS=repeat_40674
Oryza sativa repeat_4530 FILTER=R&REPEATS=repeat_4530
Fungi repeat_4751 FILTER=R&REPEATS=repeat_4751
Caenorhabditis briggsae repeat_6238 FILTER=R&REPEATS=repeat_6238
Caenorhabditis elegans repeat_6239 FILTER=R&REPEATS=repeat_6239
Anopheles gambiae repeat_7165 FILTER=R&REPEATS=repeat_7165
Drosophila melanogaster repeat_7227 FILTER=R&REPEATS=repeat_7227
Ciona intestinalis repeat_7719 FILTER=R&REPEATS=repeat_7719
Danio rerio repeat_7955 FILTER=R&REPEATS=repeat_7955
Human repeat_9606 FILTER=R&REPEATS=repeat_9606
Rodentia repeat_9989 FILTER=R&REPEATS=repeat_9989

In standalone wwwblast and commandline BLAST, repeat filter settings (-F R or -F "R -drodent.lib") require special repeat libraries. NCBI does not have the right to redistribute these libraries. To obtain species specific repeat libraries, please visit the home page of Genome Information Research Institute (http://www.girinst.org/).

Unlike standalone wwwblast or commandline BLAST methods, using BLAST URLAPI can display the filter masked regions, such as low complexity, repeat, or user selected region in lowercase with 'LCASE_MASK=on', in multiple formats through the combination of 'MASK_CHAR' and 'MASK_COLOR' parameters provided by the formatter. Combinations of the two parameters allow users to see the actual masked regions in colored lowercase letters. An example of alignment containing a masked region is given below (6.11).


next up previous contents index
Next: 6.5 MEGABLAST Features Up: 6 Combinations of Parameters Previous: 6.3 Entrez Query Terms   Contents   Index
Tao Tao 2007-08-03