In standalone commandline BLAST, the filter parameter (-F) takes a string as input. The function of this parameter is to mask regions of the query sequence which have biased composition and/or unclear biological significance. This in turn allows BLAST to focus more on finding the biologically meaningful matches to the remainder of the query.
The arguments accepted by the '-F' parameter include: T, F, D, L, R, V, S, C, and m. L stands for Low complexity, D for DUST (nucleotide low complexity), R for human repeats, V for Vector, S for SEG (protein low complexity filter), and C for coil. The input m to '-F' stands for masking for the lookup table only, which masks during the lookup stage but allows BLAST to extend through the masked region during the alignment extension.
S (SEG) has other user specifiable values. For example, -F "S 10 1.0 1.5" means to use SEG filter with a window of 10, low cut of 1, and high cut of 1.5.
C (COIL) also has user specifiable values ([3], [6]). For example, -F "C 28 40 32" stands for COIL filter with a window of 22, cutoff of 40, and linker of 32.
To run the SEG and COIL filters together, use: -F "S; C"
To mask the lookup table only without affecting the extension, add m: -F "m S; C"
To mask human repeat sequences use: -F R or -F "m R"
To combine with the low complexity filter, use: -F "m L;R"
To mask vector sequences, use: -F "V"
BLAST URLAPI accepts multiple 'FILTER=value' input. Multiple options for the FILTER parameter need to be specified with multiple 'FILTER=value' pairs. Some of the filter strings available in standalone commandline BLAST are not available to the FILTER parameter of the BLAST URLAPI. One is the coil-coil filter (C) for protein queries, and another, for nucleotide search, is vector masking (V). An unofficial workaround of the latter is to incorportate '...&FILTER=R+-d+UniVec&...' in the URL, which forces BLAST to do 'repeat' masking with the UniVec library.
Some species-specific repeat filters are available and require a special command. For example, to call rodent repeat libraries, we will need to use '...&FILTER=R+-d+rodents.lib&...'. Users can also use the REPEATS parameter to access other species specific repeat libraries, which are given in Table 6.3 below.
In standalone wwwblast and commandline BLAST, repeat filter settings (-F R or -F "R -drodent.lib") require special repeat libraries. NCBI does not have the right to redistribute these libraries. To obtain species specific repeat libraries, please visit the home page of Genome Information Research Institute (http://www.girinst.org/).
Unlike standalone wwwblast or commandline BLAST methods, using BLAST URLAPI can display the filter masked regions, such as low complexity, repeat, or user selected region in lowercase with 'LCASE_MASK=on', in multiple formats through the combination of 'MASK_CHAR' and 'MASK_COLOR' parameters provided by the formatter. Combinations of the two parameters allow users to see the actual masked regions in colored lowercase letters. An example of alignment containing a masked region is given below (6.11).