Program Option for Netblast (blastcl3)
Tao Tao, Ph.D.
User Service
NCBI, NLM, NIH

1. Introduction

NCBI BLAST web server provides a convenient and user friendly way for individuals to search their queries against different public sequence databases. Even though this server can take multiple quereis and perform batch searches, true large scale batch searches may not go through if the input queries are long and the search settings are less stringent. In addition, the available databases for the web interface is somewaht limited. BLAST client provides a way to circumvent those limitations.

The BLAST client, or blastcl3, bypasses the web browser and interacts directly with the NCBI BLAST server that powers the NCBI web BLAST service (www.ncbi.nlm.nih.gov/BLAST/). It performs the batch search with multiple sequences by taking one query sequence at a time from the input file, formulating the search according to the settings of the command line parameters, and sending the search through the internet connection to NCBI BLAST server for processing. The program receives the search results from the BLAST server, in the format set in the search command line, and saves it to a local file specified. The program loops through all the queries in the input file until all are searched.

This program has no graphic user interface (GUI) and must be executed from command line under a terminal window. Users control the program through command line options. Detailed list of command line options are in Section 4. For usages and situation examples, see Section 3.

2. Installation and setup

NCBI provides the BLAST client as netblast-initialed archive, separate from that for standalone command line, blast-initialed, and standalone server blast, wwwblast-initialed. All of them can be found at:


ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/

2.1 Installation

For Linux or Unix environment, installation is straight forward. One can place the archive in a desired directory and extract the archive using the following command line:

tar zxvf netblast-##-**.tar.gz

The resulted netblast-#.#.# directory contains bin, doc, and data subdirectories. The program, blastcl3, is under the bin subdirectory. The matrices BLAST needs for protein alignments are under the data subdirectory, while the doc subdirectory contains netblast.html and firewall.html with more information on configuration of blastcl3 behind firewalls.

The package for Windows can be extracted using WinZip. It does not have this directory structure.

2.2 Firewall settings

The setup for NCBI network clients has been greatly simplified. Users, not behind a firewall, can use the program after the extraction above. For those users behind a firewall, and already use Sequin or Entrez, or if your system administrator has already performed the setup, they should also be able to start performing searches after installation.

If the above are not the case, users will need to make sure that the following IP address/port combinations are open in the firewall configuration.

Table 3. Firewall Ports Needed by BLAST Client for NCBI Connection
IP AddressPort Number
130.14.29.112 5861
130.14.29.112 5862
130.14.29.112 5863
Note: Please refer to 'firewall.html' included in the package for details.

In addition to this, users will need to create an .ncbirc file to instruct blastcl3 how to make the connection to NCBI. This file should contain the information listed below and be placed in the home directory. For PC running Windows, the file is named ncbi.ini which should be placed under the windows directory.

[NCBI]
DATA=/home/johndoe/netblast-#.#.#/data

[CONN]
FIREWALL=TRUE

[NET_SERV]
SRV_CONN_MODE=SERVICE
Note: Replace the path to data directory with the path specific to your installation.

We may encounter problems while using blastcl3. The most common cause for this problem is firewall configuration related. A representative error message generally would contain "[CONN_Open] Cannot open connection", "<<< Re-establishing NETBLAST Service >>>", or something in that order.

Adding the following two lines in the .ncbirc (or ncbi.ini) file will increase the timeout setting and generate more informative messages that are useful in debugging the problem:

TIMEOUT=300
DEBUG_PRINTOUT=DATA

Search related errors from NCBI BLAST server typically are accompanied by RID for the relevant searches. Those RIDs should be saved and sent to NCBI blast-help@ncbi.nlm.nih.gov for trouble shooting purposes.

As an alternative to blastcl3, NCBI BLAST web server also supports URLAPI, which uses URL encoded command to interact with Blast.cgi directly to "Put" search requests onto the BLAST server, or to "Get" search results from the the same server. For details on BLAST URLAPI, please refer to:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/

3. Practical usage examples

Before we get into the actual use, we need to discuss the format of the input query. The only query format blastcl3 accepts is FASTA. In this format, the query begins with a "greater than" sign (>) initialed definition line, or defline as it is commonly known. This defline contains a basic description of the sequence, such as its source, the gene it represents, or ways the sequence is identified. The defline terminates with a hard return. The actual sequence immediately follows the defline in one or more lines, each terminates with a hard return. Multiple query sequences should be concatenated together one after another. Sample query sequences are presented below for your reference.

>gi|4557757|ref|NP_000240.1| MutL protein homolog 1
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNGTGIRK
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPK
PCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNA
STVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVY
AAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLP
GLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDIS
SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEM
TAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELF
YQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEI
DEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQ
QSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC
>gi|68348711|ref|NP_001234.2| tumor necrosis factor receptor 8
MRVLLAALGLLFLGALRAFPQDRPFEDTCHGNPSHYYDKAVRRCCYRCPMGLFPTQQCPQRPTDCRKQCE
PDYYLDEADRCTACVTCSRDDLVEKTPCAWNSSRVCECRPGMFCSTSAVNSCARCFFHSVCPAGMIVKFP
GTAQKNTVCEPASPGVSPACASPENCKEPSSGTIPQAKPTPVSPATSSASTMPVRGGTRLAQEAASKLTR
APDSPSSVGRPSSDPGLSPTQPCPEGSGDCRKQCEPDYYLDEAGRCTACVSCSRDDLVEKTPCAWNSSRT
CECRPGMICATSATNSRARCVPYPICAAETVTKPQDMAEKDTTFEAPPLGTQPDCNPTPENGEAPASTSP
TQSLLVDSQASKTLPIPTSAPVALSSTGKPVLDAGPVLFWVILVLVVVVGSSAFLLCHRRACRKRIRQKL
HLCYPVQTSQPKLELVDSRPRRSSTQLRSGASVTEPVAEERGLMSQPLMETCHSVGAAYLESLPLQDASP
AGGPSSPRDLPEPRVSTEHTNNKIEKIYIMKADTVIVGTVKAELPEGRGLAGPAEPELEEELEADHTPHY
PEQETEPPLGSCSDVMLSVEEEGKEDPLPTAASGK
Note that the file containing the query sequences has to be saved to a plain text file.

The program runs under a command or terminal window. On PC the command window can be launched using "Start ► Program ► Accessories ► Command Prompt". On Mac, the Terminal program icon usually is under the Utilities folder. Double click the grey icon will launch it.

In the terminal window, cd to the directory containing the blastcl3, then run the program from there. Typing "blastcl3 -" without quotes followed by a return should display the command line options on the screen. On Mac and Unix/Linux platform, type "./blastcl3 -" without quotes.

Since the list of available database has increased significantly, we have removed them from this document. Users can find the list from this file:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html

    3.1 General nucleotide searches

The primary use of nucleotide BLAST search is to identify the input query by finding if exact match(es) are present in the database, which can also be used to identify the genomic counterpart of an input mRNA sequence or vice versa. Another use is to search with primer pairs to identify the annealing target and possible secondary annealing sites.

For sequences from well studied non-prokaryote model organisms, a good approach is to search against the refseq_rna database with Entrez limit. Alternatively, search against nr with or without limit to the target organism can also offer good lead.

The following example command lines search the input query file new_seq.txt against either the refseq_rna or nr database and save the result in n_refm.out and n_nr.out, respectively.

blastcl3 -p blastn -i new_seq.txt -p blastn -d refseq_rna -o n_refm.out
blastcl3 -p blastn -i new_seq -p blastn -d nr -o n_nr.out
Note: the complete file name should be used as input to -i parameter. To see the complete extension under Windows, you will need to change the setting of the "View" tab under "Tools ► Folder Option" to uncheck "Hide extensions of known file types".

We can further adjust the setting to restrict the search to the mouse entries in those two databases by using entrez limit, invoke megablast algorithm, and use a lower expect value of 0.001. The last two settings increase the search stringency. The actual paramter settings in the command line are:

-u "mouse[organism]" -n T -e 0.001

For easy parsing of the BLAST search result, we can request the result be returned in either XML or "Hit Table" (tabular) format using "-m 7" or "-m 9" (without quotes) in the command line.

Seaching a genomic DNA against nucleotide database, we should invoke the repeat filter to mask the repeat region and prevent BLAST program from being inundated by spurious hits to those regions. The following two -F settings are for human and rodents.

-F "m L; R"
-F "R -d rodent.lib"

Other species-specific repeat filters are also available. The command line convention is:

-F "R -d repeat/taxid_repeat"

Refer to the end of Table 2 in the "Remotely Accessible BLAST Database List" for more information.

Combining these together, the following command line searches the n_seq input nucleotide query file against the human subset in the refseq_genomic database with low complexity and human repeat filter and megablast algorithm. The expect value cutoff is set to 2x10-10 and the output is saved in refg.output. Note that command line should be in a single line, the wrapping is due to line length limit.

blastcl3 -i n_seq -p blastn -d refseq_genomic -u "human[orgn]" -n T -F "m L; R" 
-e 2e-10 -o refg.output

    3.2 General protein searches

A protein BLAST search can be used to identify the input query protein and its function through matching to other known proteins and their annotation. One such database is refseq_protein. The following command line searches protein sequences in my_query.txt against this database using blastp. The result is saved in my_output.txt.

blastcl3 -p blastp -i my_query.txt -d refseq_protein -o my_output.txt

For functional analysis of unknown proteins, a more effective way is to searching against cdd database since matches from cdd search will identify the conserved functional domain(s) present in the query. Defline and annotation from these matched domains will provide a better revelation of the query's function. The following command line does such a search against the cdd database, "-d cdd", using rpsblast, "-R T" and saves the result in my_output.txt:

blastcl3 -p blastp -R T -i my_query.txt -d cdd -o my_output.txt

Specific search against pdb database can be used to identify existing structures with matching sequences useful for structure modeling purposes. We do not support PSI-BLAST or PHI-BLAST searches through blastcl3.

    3.3 Translated BLAST searches

Translated searches can be very informative in revealing the possible function of a nucleotide query since the search and alignment is performed at the protein level, which is more sensitive, with higher level of conservation, and biologically relevant. Translated search for protein query is an effective way to find unidentified homologs/paralogs. We can divide the translated searches in to three types based on whether the translation is done for the input, the database or both.

         3.3.1. blastx: translation of the the nucleotide queries

This program searches a nucleotide query against a protein database. It translates the query in all six frames first and then searches the translated protein products against the target protein database. It is useful in identifying the protein product(s) the query may encode. It could provide information on the functions of the protein(s) should a good match to a well characterized protein is indeed found.

In the example command line below, we search the nucleotide sequences in my_query.txt against refseq_protein database. The results are saved in my_oputput file.

blatcl3 -p blastx -i my_query.txt -d refseq_protein -o my_output -m 9 -v 50 -b 50

The -m 9 instructs blast server to return the results in tabular format. The upper limit of matched database sequences per query is set to 50.

         3.3.2 tblastn: protein query against a translated nucleotide database

This program function searches an input query protein sequence against a target nucleotide database to find other protein similar products that might be encoded by those nucleotide sequences. It is a good way to find out yet unidentified homolog/paralog of a give protein query. During the search, the nucleotide database entries are first translated in all six frames. The query protein is then compared against those potential products to identify the matches.

Example given below searches the input protein query file my_query.txt against est_human database to identify human est entries that may encode proteins similar to the query. The result is saved to my_output.

blastcl3 -p tblastn -i my_query.txt -d est_human -o my_output

         3.3.3. tblastx: searches with translated nucleotide query and nucleotde database

This program function compares all six-frame translations of an input nucleotide query against those from a nucleotide database. Since this search is very computationally expensive, we recommend that our users use it with great caution by employing a higher search stringency (lower -e setting, with -F T and requires few hits than the default -v 500 -b 250) , and limiting the search to a smaller more specific subset of the database using entrez limit.

blastcl3 -p tblastx -i my_query.txt -d chromosome -u "bacteria[orgn]" 
-e 2e-5 -o my_output

The above command line searches the sequences in my_query.txt against the bacterial entries in the chromosome database. The search stringency is increased by lowering the -e setting to 2 * 10-5. The The result is saved in my_output.

Due to the extremely high computational intensity of tblastx searches, we suggest that users set up local standalone blast to performing such searches if the search volume is large and/or the need is regular.

     3.4 Genome BLAST searches

Genome BLAST pages collect the genomic sequences and other sequences specific to an organism in a centralized place for easy access. In addition, the matches from searching these databases often contain links to the graphic display in the Mapviewer for that organims. Those organism specific sequence databases are also available for search using blastcl3 with one exception: hits will not be linked to the Map Viewer.

Databases for higher eukaryotes are grouped according to orgamisms, each with its own unique database prefix. The genome assemblies and the annotation products from them are build-specific. The dataset are updated only when new assemblies are made available. They are kepted under gpipe directory. Other organism-specific databases not related to assembly are updated regularly and placed in a separate directory with general gp prefix. For example, the human genome database and other human specific databases have the "gpipe/9606/" or "gp/9606.9558" prefix.

blastcl3 -p blastn -i my_query.txt -d gp/9606/ref_contig -F "m L; R" -o my_output

The above command line searches the query sequences (-i my_query.txt) against against reference contigs from NCBI (-d ref_contig), using low complexity and repeat filter (-F "m L; R"). The output is saved to my_output.

For microbes and low eukaryotes, the available dataset will vary dependent on the status of the sequencing project. Some are finished with accompying protein data, some are unfinished partial wgs with or without accompanying protein data. The database naming convention is "Microbial/Taxid".

blastcl3 -p blastx -i my_query.txt -d Microbial/83333 -o my_output

The example command line above uses blastx to search the nucleotide sequences in my_query.txt against the protein dataset for E.coli K-12 strain. The result is saved in my_output.

Please refer to http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html for more information on databases available for remote access by blastcl3. For some organism, not listed in the document, searching against nt, wgs, est_others, htgs, or nr with appropriate Entrez limit may serve as a productive alternative.

blastcl3 -p blastn -i my_query.txt -d wgs -n T -m 9 -o my_output
-u "Oryctolagus cuniculus[Organism] AND wgs[prop]" 

The above command line uses blastn to search sequneces in my_query.txt. It also invokes the megablast algorithm and requests the tabular result. The search is limited to rabbit wgs entries through Entrez queries fed to -u parameter.

4. Appendix: Parameters and their accepted values

As mentioned before in Section 1, blastcl3 has no GUI and works only under a command terminal. Users execute the the program by issuing command lines, and control the search through parameter/value pairs in the command line. The command line parameters and their accepted values for this program are listed individually here. The options commonly adjusted during actual searches are: -i, -d, -p, -o, -e, -F, -u, -b, -v, -m, and -n. The first four are mandatory.

Table 4.1
Option-p
FunctionSpecifies which program to run
DefaultNone, mandatory
Input FormatString
ExampleTo run blastn program use: -p blastn
Note:
Program string options and type of search they specify
Program      Query                   Database
blastn       nucleotide              nucleotide
blastp       protein                 protein
blastx       nucleotide, translated  Protein
tblastn      protein                 nucleotide, translated
tblastx      nucleotide, translated  nucleotide, translated

Table 4.2
Option-d
FunctionSpecifies target database(s) to search against
Defaultnr
Input FormatString
ExampleTo search est_human, use: -d est_human
Note:
To search against multiple database, use -d "db1 db2". Be conservative, do not combine large databases, and use stringent conditions since search against large databases may be aborted due to CPU time limit. Currently, the CPU limit is set at one hour.

Table 4.3
Option-i
FunctionSpecifies input query file
Defaultstdin
Input FormatString, mandatory
ExampleTo use sequences from query.txt as query, use: -i query.txt
Note:
Use the complete file name WITH its extension. To use stdin default, omit the -i and redirect using <.
blastcl3 -d nt -p blastn -e 0.001 < mito.txt

Table 4.4
Option-e
FunctionSpecifies Expect value cutoff
Default10
Input FormatReal
ExampleTo lower the -e setting to 0.001, use: -e 0.001
Note:
Accepted formats are integer, fraction, decimal, exponential, and scientific notation. To set the cutoff to 2×10-20, use -e 2e-20

Table 4.5
Option-m
FunctionSpecifies alignment view option
Default0
Input FormatInteger
ExampleTo display the result in XML format use: -m 7
Note:
Option values and the output formats they specify
    0    Pairwise
    1    query-anchored showing identities
    2    query-anchored no identities
    3    flat query-anchored, show identities
    4    flat query-anchored, no identities
    5    query-anchored no identities and blunt ends
    6    flat query-anchored, no identities and blunt ends
    7    XML Blast output
    8    tabular (not post processing)
    9    tabular with comment lines (post-processed, sorted)
    10   ASN, text
    11   ASN, binary

Table 4.6
Option-o
FunctionSpecifies the output file
Defaultstdout (print to screen)
Input FormatString [file name]
ExampleTo save result in out.txt, use: -o out.txt
Note:
-p, -i, -d, -o are the core parameters needed for a blastcl3 search.

Table 4.7
Option-F
FunctionSpecifies which filter(s) to use to mask query sequence
DefaultT (DUST for nucleotide, SEG for protein)
Input FormatString
ExampleTo filter low complexity and lookup table only, use: -F "m L"
Note:
Accepted strings: T, F, D, L, R, V, S, C, and m. For more information see:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/URLAPI_node82.html.

Table 4.8
Option-G
FunctionCost to open a gap
Default0
Input Format[Integer]
ExampleTo increase the gap open penalty to 10, use: -G 10
Note:
Zero invokes default (5) for blastn. It varies for others. For BLAST searches (since version 2.2.13), only a controlled set of -G/-E value pairs are acceptable for a given scoring matrix.

Table 4.9
Option-E
FunctionCost to extend a gap
Default0
Input Format[Integer]
ExampleTo increase the gap extension penalty to 4, use: -E 4
Note:
Zero invokes default or 2 for blastn. It varies for others. For more information, see: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/URLAPI_node89.html

Table 4.10
Option-X
FunctionX dropoff value for gapped alignment (in bits)
Default0
Input Format[Integer]
ExampleTo increase the gapped alignment dropoff to 40, use: -X 40
Note:
Gapped Alignment Dropoff Default Settings (in bits)
Program       blastn       megablast       tblastx       others
Value           30            20              0            15

Table 4.11
Option-I (capital i)
FunctionShow GI in definition line
DefaultF
Input Format[T/F]
ExampleTo activate the GI display use: -I T
Note:
Sample displays for the two settings:
-I T: gi|223046|prf||0410468A ... 
-I F: prf||0410468A ...

Table 4.12
Option-q
FunctionPenalty for a nucleotide mismatch
Default-3
Input Format[Integer]
ExampleTo set penalty to -2, use: -q -2
Note:
For blastn only, different -r/-q ratios are optimal for aligning sequences with different percentage of similarities.

Table 4.13
Option-r
FunctionSepcifies reward for a nucleotide match
Default1
Input Format[Integer]
ExampleTo increase the reward to 2, use: -r 2
Note:
For blastn only. Others use external scoring matrix to determine this. See -M table in blastall for more details.

Table 4.14
Option-v
FunctionSpecifies the upper limit of database sequences to show descriptions for
Default500
Input Format[Integer]
ExampleTo increase the descriptions displayed to 1000 use: -v 1000
Note:
Web counterpart is "Descriptions". Actual number may be lower due to lack of hits above -e cutoff.

Table 4.15
Option-b
FunctionSpecifies the upper limit of databases sequences to show alignments for
Default[Integer]
Input Format250
ExampleTo increase the alignment displayed to 1000, use: -b 1000
Note:
Physical limit is 200000. Web counterpart: "Alignments". This is NOT the total number of alignment segments or high scoring pairs (HSPs). Rather it is the number of database sequences with HSP(s) to the query.

Table 4.16
Option-f
FunctionThreshold for extending hits
Default0
Input FormatInteger
ExampleTo increase this threshold to 15, use: -f 15
Note:
Default is used if set to zero, not relevant to blastn or megablast. Extension threshold default settings are:
Program   blastp   blastn   blastx   tblastn   tblastx  megablast
Value       11       0        12       13        13        0

Table 4.17
Option-g
FunctionPerform gapped alignment
DefaultT
Input Format[T/F]
ExampleTo do only ungapped alignment, use: -g F
Note:
Default is gapped alignment, not available with tblastx.

Table 4.18
Option-Q
FunctionQuery genetic code to use
Default1
Input Format[Integer]
ExampleTo set the genetic code (translation table) to 14, use: -Q 14
Note:
This specifies the translation table used in query translation during blastx and tblastx searches. The default is universal codon.

Table 4.19
Option-D
FunctionDB Genetic code
Default1
Input Format[Integer]
ExampleTo set the genetic code (translation table) to 14, use: -D 14
Note:
This specifies the translation table used in the database translation in tblastn and tblastx searches. Details on translateion table is at: www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c

Table 4.20
Option-a
FunctionNumber of processors to use
Default1
Input Format[Integer]
ExampleTo change this to two CPUs, use: -a 2
Note:
This may not be relevant after the splitd implementation.

Table 4.21
Option-O
FunctionTo save SeqAlign object
DefaultN/A
Input FormatString [File Out]
ExampleTo save SeqAlign object to blast_seqalign, use: -O blast_seqalign
Note:
Users can use the output to reformat the result into different format using NCBI toolkit function. See ftp://ftp.ncbi.nih.gov/blast/demo/ subdirectory for more information.

Table 4.22
Option-J
FunctionBelieve the query definition line
DefaultF
Input Format[T/F]
ExampleTo set this to true, use: -J T
Note:
The default is set to false since most query deflines do not follow NCBI convention.

Table 4.23
Option-M
FunctionProtein scoring matrix to use
DefaultBLOSUM62
Input Format[String]
ExampleTo change this to PAM30, use: -M PAM30
Note:
Accepted inputs are: BLOSUM45, BLOSUM62, BLOSUM80, PAM30, or PAM70.

Table 4.24
Option-W
FunctionWord size
Default0
Input Format[Integer]
ExampleTo set word size to 32, use: -W 32
Note:
Word size setting for different programs
Program   blastn   megablast  all others
Value       11        28          3

Table 4.25
Option-z
FunctionEffective length of the database
Default0
Input Format[Real]
ExampleTo set this to 10000000, use: -z 10000000
Note:
Leaving out this parameter or setting it to zero (0), BLAST will use the actual database size.

Table 4.26
Option-K
FunctionNumber of best hits from a region to keep
Default0
Input Format[Integer]
Example To keep 200 hits, use: -K 200
Note:
This selects the specified number of best hits for a given region of the query for further evaluation. Off by default, 100 recommended if used.

Table 4.27
Option-P
FunctionUse multiple hit
Default0
Input FormatInteger
ExampleTo do single hit, use: -P 1
Note:
Zero is for multiple hit, 1 for single hit. Not applicable to blastn.

Table 4.28
Option-Y
FunctionEffective length of the search space
Default0
Input Format[Real]
ExampleTo set this to 10000000, use: -Y 10000000
Note:
This is the product of effective query length and effective database length - actual length corrected for edge effects. Use zero for actual size.

Table 4.29
Option-S
FunctionStrands of the nucleotide query to use in the search
Default3
Input Format[Integer]
ExampleTo search with the reverse complement strand only, use: -S 2
Note:
-S Input Code And Meaning for blastn, blastx, and tblastx.
Meaning  Input sequence    Reverse complement    Both
Value          1                   2               3

Table 4.30
Option-T
FunctionProduce HTML output
DefaultF
Input Format[T/F]
ExampleTo generate HTML formatted output, use: -T T
Note:
With -T T, BLAST will hyperlink the matched subject sequences to their actual entries in Entrez.

Table 4.31
Option-u
FunctionRestrict search of database to the subset satisfying the query
DefaultN/A
Input Format[Entrez Term] in quotes
ExampleTo restrict entries to mRNA use: -u "biomol_mrna[prop]"
Note:
Argument to this parameter is a set of valid Entrez query terms. BLAST server will use the terms to retrieve a list of GI numbers and use them to restrict the BLAST search to entries specified by the list. Make sure valid terms are used. For example, it does not make sense to restrict a search to genomic sequences while searching against the est database. See Entrez Help

Table 4.32
Option-U
FunctionUse lower case filtering of FASTA sequence
DefaultF
Input Format[T/F]
ExampleTo turn lowercase filter on, use: -U T
Note:
Make sure that only the query sequences to be masked are in UPPERCASE and only the filtered portions are in lowercase.

Table 4.33
Option-y
FunctionX dropoff value for ungapped extensions (in bits)
Default0
Input Format[Real]
ExampleTo increase the dropoff to 25, use: -y 25
Note:
Default settings for ungapped alignment X dropoff (-y, in bits) Program blastn megablast others Value 20 10 7

Table 4.34
Option-Z
FunctionX dropoff value for final gapped alignment (in bits)
Default0
Input Format[Integer]
ExampleTo increase this dropoff to 60, use: -Z 60
Note:
Large dropoff value settings may help generate longer alignment.
Default setting for ungapped alignment X dropoff (-Z, in bits)
Program   blastn   megablast   tblastx   all others
Value       50        50         25          0

Table 4.35
Option-R
FunctionRun rpsblast search
DefaultF
Input Format[T/F]
ExampleTo run rpsblast search, use: -R T
Note:
Set this to "T" will perform rpsblast search against CDD database. It requires an appropriate -d input. See Remote Accessible BLAST Databases

Table 4.36
Option-n
FunctionEnable megablast search
DefaultF
Input Format[T/F]
ExampleTo enable megablast search, use -n T
Note:
Setting this to "T" invokes megablast algorithm. -W will default to 28 and queries will be concatenated. This will help speed up the search at the expense of search sensitivities.

Table 4.37
Option-L
FunctionLocation on query sequence
DefaultN/A
Input Format[String]
ExampleTo search with 100 to 400 of a query, use: -L "100,400"
Note:
In -L "100,400", 100 is the start and 400 the end.

Table 4.38
Option-A
FunctionMultiple hits window size
Default0
Input Format[Integer]
ExampleTo increase the window size to 50, use: -A 50
Note:
Default -A setting for different programs
Program   blastn   megablast   all others
Value        0         0          40

Table 4.39
Option-w
FunctionFrame shift penalty
Default0 (no penalty)
Input Format[Integer]
ExampleTo set OOF penalty to 10, use: -w 10
Note:
Non-zero settings invokes OOF (Out Of Frame) algorithm for blastx.

Table 4.40
Option-t
FunctionLength of the largest intron allowed in tblastn for linking HSPs
Default0
Input Format[Integer]
ExampleTo allow linking of HSPs 10000 letter apart, use: -t 10000
Note:
Zero disables linking. Otherwise, the value specified will be used.