1. Introduction
NCBI BLAST web server provides a convenient and user friendly way for individuals to search
their queries against different public sequence databases. Even though this server can take
multiple quereis and perform batch searches, true large scale batch searches may not go through
if the input queries are long and the search settings are less stringent. In addition, the available
databases for the web interface is somewaht limited. BLAST client provides a way to circumvent
those limitations.
The BLAST client, or blastcl3, bypasses the web browser and interacts directly with the NCBI BLAST server
that powers the NCBI web BLAST service (www.ncbi.nlm.nih.gov/BLAST/).
It performs the batch search with multiple sequences by taking one query sequence at a time from the input file,
formulating the search according to the settings of the command line parameters, and sending the search through the
internet connection to NCBI BLAST server for processing. The program receives the search results from the BLAST server,
in the format set in the search command line, and saves it to a local file specified. The program loops through
all the queries in the input file until all are searched.
This program has no graphic user interface (GUI) and must be executed from command line under a
terminal window. Users control the program through command line options. Detailed list of
command line options are in Section 4. For usages and situation examples, see
Section 3.
2. Installation and setup
NCBI provides the BLAST client as netblast-initialed archive, separate from that for standalone command line, blast-initialed,
and standalone server blast, wwwblast-initialed. All of them can be found at:
ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/
2.1 Installation
For Linux or Unix environment, installation is straight forward. One can place the archive in a
desired directory and extract the archive using the following command line:
tar zxvf netblast-##-**.tar.gz
|
The resulted netblast-#.#.# directory contains bin, doc, and data subdirectories.
The program, blastcl3, is under the bin subdirectory. The matrices BLAST needs for protein alignments are
under the data subdirectory, while the doc subdirectory contains netblast.html and firewall.html with more
information on configuration of blastcl3 behind firewalls.
The package for Windows can be extracted using WinZip. It does not have this directory structure.
2.2 Firewall settings
The setup for NCBI network clients has been greatly simplified. Users, not behind a firewall, can use the
program after the extraction above. For those users behind a firewall, and already use Sequin or Entrez,
or if your system administrator has already performed the setup, they should also be able to start
performing searches after installation.
If the above are not the case, users will need to make sure that the following IP address/port combinations
are open in the firewall configuration.
|
Table 3. Firewall Ports Needed by BLAST Client for NCBI Connection |
| IP Address | Port Number |
| 130.14.29.112 | 5861 |
| 130.14.29.112 | 5862 |
| 130.14.29.112 | 5863 |
Note: Please refer to 'firewall.html' included in the package for details.
In addition to this, users will need to create an .ncbirc file to instruct blastcl3 how to make the connection
to NCBI. This file should contain the information listed below and be placed in the home directory. For PC running
Windows, the file is named ncbi.ini which should be placed under the windows directory.
[NCBI]
DATA=/home/johndoe/netblast-#.#.#/data
[CONN]
FIREWALL=TRUE
[NET_SERV]
SRV_CONN_MODE=SERVICE
|
Note: Replace the path to data directory with the path specific to your installation.
We may encounter problems while using blastcl3. The most common cause for this problem is firewall configuration related.
A representative error message generally would contain "[CONN_Open] Cannot open connection",
"<<< Re-establishing NETBLAST Service >>>", or something in that order.
Adding the following two lines in the .ncbirc (or ncbi.ini) file will increase the timeout setting and generate
more informative messages that are useful in debugging the problem:
TIMEOUT=300
DEBUG_PRINTOUT=DATA
|
Search related errors from NCBI BLAST server typically are accompanied by RID for the relevant searches. Those RIDs
should be saved and sent to NCBI blast-help@ncbi.nlm.nih.gov for
trouble shooting purposes.
As an alternative to blastcl3, NCBI BLAST web server also supports URLAPI, which uses URL
encoded command to interact with Blast.cgi directly to "Put" search requests onto the BLAST server,
or to "Get" search results from the the same server. For details on BLAST URLAPI, please refer to:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/
3. Practical usage examples
Before we get into the actual use, we need to discuss the format of the input query. The only
query format blastcl3 accepts is FASTA. In this format, the query begins with a "greater than"
sign (>) initialed definition line, or defline as it is commonly known. This defline contains a
basic description of the sequence, such as its source, the gene it represents, or ways the sequence
is identified. The defline terminates with a hard return. The actual sequence immediately follows
the defline in one or more lines, each terminates with a hard return. Multiple query sequences
should be concatenated together one after another. Sample query sequences are presented below for
your reference.
>gi|4557757|ref|NP_000240.1| MutL protein homolog 1
MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLIQIQDNGTGIRK
EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPK
PCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNA
STVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVY
AAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLP
GLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDIS
SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEM
TAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELF
YQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEI
DEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQ
QSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC
>gi|68348711|ref|NP_001234.2| tumor necrosis factor receptor 8
MRVLLAALGLLFLGALRAFPQDRPFEDTCHGNPSHYYDKAVRRCCYRCPMGLFPTQQCPQRPTDCRKQCE
PDYYLDEADRCTACVTCSRDDLVEKTPCAWNSSRVCECRPGMFCSTSAVNSCARCFFHSVCPAGMIVKFP
GTAQKNTVCEPASPGVSPACASPENCKEPSSGTIPQAKPTPVSPATSSASTMPVRGGTRLAQEAASKLTR
APDSPSSVGRPSSDPGLSPTQPCPEGSGDCRKQCEPDYYLDEAGRCTACVSCSRDDLVEKTPCAWNSSRT
CECRPGMICATSATNSRARCVPYPICAAETVTKPQDMAEKDTTFEAPPLGTQPDCNPTPENGEAPASTSP
TQSLLVDSQASKTLPIPTSAPVALSSTGKPVLDAGPVLFWVILVLVVVVGSSAFLLCHRRACRKRIRQKL
HLCYPVQTSQPKLELVDSRPRRSSTQLRSGASVTEPVAEERGLMSQPLMETCHSVGAAYLESLPLQDASP
AGGPSSPRDLPEPRVSTEHTNNKIEKIYIMKADTVIVGTVKAELPEGRGLAGPAEPELEEELEADHTPHY
PEQETEPPLGSCSDVMLSVEEEGKEDPLPTAASGK |
Note that the file containing the query sequences has to be saved to a plain text file.
The program runs under a command or terminal window. On PC the command window can be launched
using "Start ► Program ► Accessories ► Command Prompt". On Mac, the
Terminal program icon usually is under the Utilities folder. Double click the grey icon will
launch it.
In the terminal window, cd to the directory containing the blastcl3, then run the program from there.
Typing "blastcl3 -" without quotes followed by a return should display the command line options on the screen.
On Mac and Unix/Linux platform, type "./blastcl3 -" without quotes.
Since the list of available database has increased significantly, we have removed them from this
document. Users can find the list from this file:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html
3.1 General nucleotide searches
The primary use of nucleotide BLAST search is to identify the input query by finding if exact
match(es) are present in the database, which can also be used to identify the genomic
counterpart of an input mRNA sequence or vice versa. Another use is to search with primer pairs
to identify the annealing target and possible secondary annealing sites.
For sequences from well studied non-prokaryote model organisms, a good approach is to search against the
refseq_rna database with Entrez limit. Alternatively, search against nr with or
without limit to the target organism can also offer good lead.
The following example command lines search the input query file new_seq.txt against either the
refseq_rna or nr database and save the result in n_refm.out and n_nr.out, respectively.
blastcl3 -p blastn -i new_seq.txt -p blastn -d refseq_rna -o n_refm.out
blastcl3 -p blastn -i new_seq -p blastn -d nr -o n_nr.out |
Note: the complete file name should be used as input to -i parameter. To see the complete extension under
Windows, you will need to change the setting of the "View" tab under "Tools ► Folder Option" to
uncheck "Hide extensions of known file types".
We can further adjust the setting to restrict the search to the mouse entries in those two databases by using entrez
limit, invoke megablast algorithm, and use a lower expect value of 0.001. The last two settings increase the search
stringency. The actual paramter settings in the command line are:
-u "mouse[organism]" -n T -e 0.001 |
For easy parsing of the BLAST search result, we can request the result be returned in either XML or
"Hit Table" (tabular) format using "-m 7" or "-m 9" (without quotes) in the command line.
Seaching a genomic DNA against nucleotide database, we should invoke the repeat filter to mask
the repeat region and prevent BLAST program from being inundated by spurious hits to those
regions. The following two -F settings are for human and rodents.
-F "m L; R"
-F "R -d rodent.lib"
|
Other species-specific repeat filters are also available. The command line convention is:
-F "R -d repeat/taxid_repeat"
|
Refer to the end of Table 2
in the "Remotely Accessible BLAST Database List" for more information.
Combining these together, the following command line searches the n_seq input nucleotide query
file against the human subset in the refseq_genomic database with low complexity and human
repeat filter and megablast algorithm. The expect value cutoff is set to 2x10-10 and
the output is saved in refg.output. Note that command line should be in a single line, the wrapping
is due to line length limit.
blastcl3 -i n_seq -p blastn -d refseq_genomic -u "human[orgn]" -n T -F "m L; R"
-e 2e-10 -o refg.output
|
3.2 General protein searches
A protein BLAST search can be used to identify the input query protein and its function through
matching to other known proteins and their annotation. One such database is refseq_protein.
The following command line searches protein sequences in my_query.txt against this database
using blastp. The result is saved in my_output.txt.
blastcl3 -p blastp -i my_query.txt -d refseq_protein -o my_output.txt
|
For functional analysis of unknown proteins, a more effective way is to searching against cdd database since
matches from cdd search will identify the conserved functional domain(s) present in the query. Defline and
annotation from these matched domains will provide a better revelation of the query's function. The following
command line does such a search against the cdd database, "-d cdd", using rpsblast, "-R T" and saves
the result in my_output.txt:
blastcl3 -p blastp -R T -i my_query.txt -d cdd -o my_output.txt |
Specific search against pdb database can be used to identify existing structures with matching
sequences useful for structure modeling purposes. We do not support PSI-BLAST or PHI-BLAST searches
through blastcl3.
3.3 Translated BLAST searches
Translated searches can be very informative in revealing the possible function of a nucleotide query since
the search and alignment is performed at the protein level, which is more sensitive, with higher level of
conservation, and biologically relevant. Translated search for protein query is an effective way to
find unidentified homologs/paralogs. We can divide the translated searches in to three types based on whether
the translation is done for the input, the database or both.
3.3.1. blastx: translation of the the nucleotide queries
This program searches a nucleotide query against a protein database. It translates the query
in all six frames first and then searches the translated protein products against the target protein
database. It is useful in identifying the protein product(s) the query may encode. It could provide
information on the functions of the protein(s) should a good match to a well characterized protein
is indeed found.
In the example command line below, we search the nucleotide sequences in my_query.txt
against refseq_protein database. The results are saved in my_oputput file.
blatcl3 -p blastx -i my_query.txt -d refseq_protein -o my_output -m 9 -v 50 -b 50
|
The -m 9 instructs blast server to return the results in tabular format. The upper limit of matched
database sequences per query is set to 50.
3.3.2 tblastn: protein query against a translated nucleotide database
This program function searches an input query protein sequence against a target nucleotide database
to find other protein similar products that might be encoded by those nucleotide sequences. It is a
good way to find out yet unidentified homolog/paralog of a give protein query. During the search,
the nucleotide database entries are first translated in all six frames. The query protein is then
compared against those potential products to identify the matches.
Example given below searches the input protein query file my_query.txt against est_human database to
identify human est entries that may encode proteins similar to the query. The result is saved to my_output.
blastcl3 -p tblastn -i my_query.txt -d est_human -o my_output |
3.3.3. tblastx: searches with translated nucleotide query and nucleotde database
This program function compares all six-frame translations of an input nucleotide query against
those from a nucleotide database. Since this search is very computationally expensive, we recommend that our users
use it with great caution by employing a higher search stringency (lower -e setting, with -F T
and requires few hits than the default -v 500 -b 250) , and limiting the search to a smaller more specific
subset of the database using entrez limit.
blastcl3 -p tblastx -i my_query.txt -d chromosome -u "bacteria[orgn]"
-e 2e-5 -o my_output
|
The above command line searches the sequences in my_query.txt against the bacterial entries in the chromosome
database. The search stringency is increased by lowering the -e setting to 2 * 10-5.
The The result is saved in my_output.
Due to the extremely high computational intensity of tblastx searches, we suggest that users
set up local standalone blast to performing such searches if the search volume is large and/or the need is regular.
3.4 Genome BLAST searches
Genome BLAST pages collect the genomic sequences and other sequences specific to an
organism in a centralized place for easy access. In addition, the matches from searching these databases
often contain links to the graphic display in the Mapviewer for that organims. Those
organism specific sequence databases are also available for search using blastcl3 with one exception:
hits will not be linked to the Map Viewer.
Databases for higher eukaryotes are grouped according to orgamisms, each with its own unique database
prefix. The genome assemblies and the annotation products from them are build-specific. The dataset are
updated only when new assemblies are made available. They are kepted under gpipe directory. Other organism-specific
databases not related to assembly are updated regularly and placed in a separate directory with general gp prefix.
For example, the human genome database and other human specific databases have the
"gpipe/9606/" or "gp/9606.9558" prefix.
blastcl3 -p blastn -i my_query.txt -d gp/9606/ref_contig -F "m L; R" -o my_output
|
The above command line searches the query sequences (-i my_query.txt) against against reference contigs from NCBI
(-d ref_contig), using low complexity and repeat filter (-F "m L; R"). The output is saved to my_output.
For microbes and low eukaryotes, the available dataset will vary dependent on the status of the sequencing project.
Some are finished with accompying protein data, some are unfinished partial wgs with or without accompanying protein data.
The database naming convention is "Microbial/Taxid".
blastcl3 -p blastx -i my_query.txt -d Microbial/83333 -o my_output
|
The example command line above uses blastx to search the nucleotide sequences in my_query.txt against the
protein dataset for E.coli K-12 strain. The result is saved in my_output.
Please refer to http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_accessible_blastdblist.html
for more information on databases available for remote access by blastcl3. For some organism, not listed in the document,
searching against
nt, wgs, est_others, htgs, or nr with appropriate Entrez limit may serve as a productive alternative.
blastcl3 -p blastn -i my_query.txt -d wgs -n T -m 9 -o my_output
-u "Oryctolagus cuniculus[Organism] AND wgs[prop]"
|
The above command line uses blastn to search sequneces in my_query.txt. It also invokes
the megablast algorithm and requests the tabular result. The search is limited to rabbit wgs entries through
Entrez queries fed to -u parameter.
4. Appendix: Parameters and their accepted values
As mentioned before in Section 1, blastcl3 has no GUI and works only under a command
terminal. Users execute the the program by issuing command lines, and control the search through parameter/value pairs
in the command line. The command line parameters and their accepted values for this program are listed individually here.
The options commonly adjusted during actual searches are: -i, -d, -p, -o, -e, -F, -u, -b, -v, -m, and -n. The first four are
mandatory.
| Table 4.1 |
| Option | -p |
| Function | Specifies which program to run |
| Default | None, mandatory |
| Input Format | String |
| Example | To run blastn program use: -p blastn |
Note:
Program string options and type of search they specify
Program Query Database
blastn nucleotide nucleotide
blastp protein protein
blastx nucleotide, translated Protein
tblastn protein nucleotide, translated
tblastx nucleotide, translated nucleotide, translated
| Table 4.2 |
| Option | -d |
| Function | Specifies target database(s) to search against |
| Default | nr |
| Input Format | String |
| Example | To search est_human, use: -d est_human |
Note: To search against multiple database, use -d "db1 db2". Be conservative, do not combine large databases,
and use stringent conditions since search against large databases may be aborted due to CPU time limit. Currently,
the CPU limit is set at one hour.
| Table 4.3 |
| Option | -i |
| Function | Specifies input query file |
| Default | stdin |
| Input Format | String, mandatory |
| Example | To use sequences from query.txt as query, use: -i query.txt |
Note: Use the complete file name WITH its extension. To use stdin default, omit the -i and redirect using <.
blastcl3 -d nt -p blastn -e 0.001 < mito.txt
| Table 4.4 |
| Option | -e |
| Function | Specifies Expect value cutoff |
| Default | 10 |
| Input Format | Real |
| Example | To lower the -e setting to 0.001, use: -e 0.001 |
Note: Accepted formats are integer, fraction, decimal, exponential, and scientific notation. To set the cutoff
to 2×10-20, use -e 2e-20
| Table 4.5 |
| Option | -m |
| Function | Specifies alignment view option |
| Default | 0 |
| Input Format | Integer |
| Example | To display the result in XML format use: -m 7 |
Note: Option values and the output formats they specify
0 Pairwise
1 query-anchored showing identities
2 query-anchored no identities
3 flat query-anchored, show identities
4 flat query-anchored, no identities
5 query-anchored no identities and blunt ends
6 flat query-anchored, no identities and blunt ends
7 XML Blast output
8 tabular (not post processing)
9 tabular with comment lines (post-processed, sorted)
10 ASN, text
11 ASN, binary
| Table 4.6 |
| Option | -o |
| Function | Specifies the output file |
| Default | stdout (print to screen) |
| Input Format | String [file name] |
| Example | To save result in out.txt, use: -o out.txt |
Note: -p, -i, -d, -o are the core parameters needed for a blastcl3 search.
| Table 4.7 |
| Option | -F |
| Function | Specifies which filter(s) to use to mask query sequence |
| Default | T (DUST for nucleotide, SEG for protein) |
| Input Format | String |
| Example | To filter low complexity and lookup table only, use: -F "m L" |
Note: Accepted strings: T, F, D, L, R, V, S, C, and m. For more information see:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/URLAPI_node82.html.
| Table 4.8 |
| Option | -G |
| Function | Cost to open a gap |
| Default | 0 |
| Input Format | [Integer] |
| Example | To increase the gap open penalty to 10, use: -G 10 |
Note: Zero invokes default (5) for blastn. It varies for others. For BLAST searches (since version 2.2.13),
only a controlled set of -G/-E value pairs are acceptable for a given scoring matrix.
| Table 4.9 |
| Option | -E |
| Function | Cost to extend a gap |
| Default | 0 |
| Input Format | [Integer] |
| Example | To increase the gap extension penalty to 4, use: -E 4 |
Note: Zero invokes default or 2 for blastn. It varies for others. For more information, see:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/URLAPI_node89.html
| Table 4.10 |
| Option | -X |
| Function | X dropoff value for gapped alignment (in bits) |
| Default | 0 |
| Input Format | [Integer] |
| Example | To increase the gapped alignment dropoff to 40, use: -X 40 |
Note: Gapped Alignment Dropoff Default Settings (in bits)
Program blastn megablast tblastx others
Value 30 20 0 15
| Table 4.11 |
| Option | -I (capital i) |
| Function | Show GI in definition line |
| Default | F |
| Input Format | [T/F] |
| Example | To activate the GI display use: -I T |
Note: Sample displays for the two settings:
-I T: gi|223046|prf||0410468A ...
-I F: prf||0410468A ...
| Table 4.12 |
| Option | -q |
| Function | Penalty for a nucleotide mismatch |
| Default | -3 |
| Input Format | [Integer] |
| Example | To set penalty to -2, use: -q -2 |
Note: For blastn only, different -r/-q ratios are optimal for aligning sequences
with different percentage of similarities.
| Table 4.13 |
| Option | -r |
| Function | Sepcifies reward for a nucleotide match |
| Default | 1 |
| Input Format | [Integer] |
| Example | To increase the reward to 2, use: -r 2 |
Note: For blastn only. Others use external scoring matrix to determine this. See
-M table in blastall for more details.
| Table 4.14 |
| Option | -v |
| Function | Specifies the upper limit of database sequences to show descriptions for |
| Default | 500 |
| Input Format | [Integer] |
| Example | To increase the descriptions displayed to 1000 use: -v 1000 |
Note: Web counterpart is "Descriptions". Actual number may be lower due to lack of hits above -e cutoff.
| Table 4.15 |
| Option | -b |
| Function | Specifies the upper limit of databases sequences to show alignments for |
| Default | [Integer] |
| Input Format | 250 |
| Example | To increase the alignment displayed to 1000, use: -b 1000 |
Note: Physical limit is 200000. Web counterpart: "Alignments".
This is NOT the total number of alignment segments or high scoring pairs (HSPs).
Rather it is the number of database sequences with HSP(s) to the query.
| Table 4.16 |
| Option | -f |
| Function | Threshold for extending hits |
| Default | 0 |
| Input Format | Integer |
| Example | To increase this threshold to 15, use: -f 15 |
Note: Default is used if set to zero, not relevant to blastn or megablast. Extension threshold default settings are:
Program blastp blastn blastx tblastn tblastx megablast
Value 11 0 12 13 13 0
| Table 4.17 |
| Option | -g |
| Function | Perform gapped alignment |
| Default | T |
| Input Format | [T/F] |
| Example | To do only ungapped alignment, use: -g F |
Note: Default is gapped alignment, not available with tblastx.
| Table 4.18 |
| Option | -Q |
| Function | Query genetic code to use |
| Default | 1 |
| Input Format | [Integer] |
| Example | To set the genetic code (translation table) to 14, use: -Q 14 |
Note: This specifies the translation table used in query translation during blastx and tblastx searches.
The default is universal codon.
| Table 4.19 |
| Option | -D |
| Function | DB Genetic code |
| Default | 1 |
| Input Format | [Integer] |
| Example | To set the genetic code (translation table) to 14, use: -D 14 |
Note: This specifies the translation table used in the database translation in tblastn and
tblastx searches. Details on translateion table is at:
www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c
| Table 4.20 |
| Option | -a |
| Function | Number of processors to use |
| Default | 1 |
| Input Format | [Integer] |
| Example | To change this to two CPUs, use: -a 2 |
Note: This may not be relevant after the splitd implementation.
| Table 4.21 |
| Option | -O |
| Function | To save SeqAlign object |
| Default | N/A |
| Input Format | String [File Out] |
| Example | To save SeqAlign object to blast_seqalign, use: -O blast_seqalign |
Note: Users can use the output to reformat the result into different format using
NCBI toolkit function. See ftp://ftp.ncbi.nih.gov/blast/demo/ subdirectory
for more information.
| Table 4.22 |
| Option | -J |
| Function | Believe the query definition line |
| Default | F |
| Input Format | [T/F] |
| Example | To set this to true, use: -J T |
Note: The default is set to false since most query deflines do not follow NCBI convention.
| Table 4.23 |
| Option | -M |
| Function | Protein scoring matrix to use |
| Default | BLOSUM62 |
| Input Format | [String] |
| Example | To change this to PAM30, use: -M PAM30 |
Note: Accepted inputs are: BLOSUM45, BLOSUM62, BLOSUM80, PAM30, or PAM70.
| Table 4.24 |
| Option | -W |
| Function | Word size |
| Default | 0 |
| Input Format | [Integer] |
| Example | To set word size to 32, use: -W 32 |
Note: Word size setting for different programs
Program blastn megablast all others
Value 11 28 3
| Table 4.25 |
| Option | -z |
| Function | Effective length of the database |
| Default | 0 |
| Input Format | [Real] |
| Example | To set this to 10000000, use: -z 10000000 |
Note: Leaving out this parameter or setting it to zero (0), BLAST will use the actual database size.
| Table 4.26 |
| Option | -K |
| Function | Number of best hits from a region to keep |
| Default | 0 |
| Input Format | [Integer] |
| Example | To keep 200 hits, use: -K 200 |
Note: This selects the specified number of best hits for a given region of the
query for further evaluation. Off by default, 100 recommended if used.
| Table 4.27 |
| Option | -P |
| Function | Use multiple hit |
| Default | 0 |
| Input Format | Integer |
| Example | To do single hit, use: -P 1 |
Note: Zero is for multiple hit, 1 for single hit. Not applicable to blastn.
| Table 4.28 |
| Option | -Y |
| Function | Effective length of the search space |
| Default | 0 |
| Input Format | [Real] |
| Example | To set this to 10000000, use: -Y 10000000 |
Note: This is the product of effective query length and effective database length
- actual length corrected for edge effects. Use zero for actual size.
| Table 4.29 |
| Option | -S |
| Function | Strands of the nucleotide query to use in the search |
| Default | 3 |
| Input Format | [Integer] |
| Example | To search with the reverse complement strand only, use: -S 2 |
Note: -S Input Code And Meaning for blastn, blastx, and tblastx.
Meaning Input sequence Reverse complement Both
Value 1 2 3
| Table 4.30 |
| Option | -T |
| Function | Produce HTML output |
| Default | F |
| Input Format | [T/F] |
| Example | To generate HTML formatted output, use: -T T |
Note: With -T T, BLAST will hyperlink the matched subject sequences to their actual entries in Entrez.
| Table 4.31 |
| Option | -u |
| Function | Restrict search of database to the subset satisfying the query |
| Default | N/A |
| Input Format | [Entrez Term] in quotes |
| Example | To restrict entries to mRNA use: -u "biomol_mrna[prop]" |
Note: Argument to this parameter is a set of valid Entrez query terms. BLAST server will use the terms
to retrieve a list of GI numbers and use them to restrict the BLAST search to entries specified by the list.
Make sure valid terms are used. For example, it does not make sense to restrict a search to genomic sequences
while searching against the est database. See
Entrez Help
| Table 4.32 |
| Option | -U |
| Function | Use lower case filtering of FASTA sequence |
| Default | F |
| Input Format | [T/F] |
| Example | To turn lowercase filter on, use: -U T |
Note: Make sure that only the query sequences to be masked are in UPPERCASE and only the
filtered portions are in lowercase.
| Table 4.33 |
| Option | -y |
| Function | X dropoff value for ungapped extensions (in bits) |
| Default | 0 |
| Input Format | [Real] |
| Example | To increase the dropoff to 25, use: -y 25 |
Note: Default settings for ungapped alignment X dropoff (-y, in bits)
Program blastn megablast others
Value 20 10 7
| Table 4.34 |
| Option | -Z |
| Function | X dropoff value for final gapped alignment (in bits) |
| Default | 0 |
| Input Format | [Integer] |
| Example | To increase this dropoff to 60, use: -Z 60 |
Note: Large dropoff value settings may help generate longer alignment.
Default setting for ungapped alignment X dropoff (-Z, in bits)
Program blastn megablast tblastx all others
Value 50 50 25 0
| Table 4.35 |
| Option | -R |
| Function | Run rpsblast search |
| Default | F |
| Input Format | [T/F] |
| Example | To run rpsblast search, use: -R T |
Note: Set this to "T" will perform rpsblast search against CDD database. It requires an
appropriate -d input. See
Remote Accessible BLAST Databases
| Table 4.36 |
| Option | -n |
| Function | Enable megablast search |
| Default | F |
| Input Format | [T/F] |
| Example | To enable megablast search, use -n T |
Note: Setting this to "T" invokes megablast algorithm. -W will default to 28 and
queries will be concatenated. This will help speed up the search at the expense of search sensitivities.
| Table 4.37 |
| Option | -L |
| Function | Location on query sequence |
| Default | N/A |
| Input Format | [String] |
| Example | To search with 100 to 400 of a query, use: -L "100,400" |
Note: In -L "100,400", 100 is the start and 400 the end.
| Table 4.38 |
| Option | -A |
| Function | Multiple hits window size |
| Default | 0 |
| Input Format | [Integer] |
| Example | To increase the window size to 50, use: -A 50 |
Note: Default -A setting for different programs
Program blastn megablast all others
Value 0 0 40
| Table 4.39 |
| Option | -w |
| Function | Frame shift penalty |
| Default | 0 (no penalty) |
| Input Format | [Integer] |
| Example | To set OOF penalty to 10, use: -w 10 |
Note: Non-zero settings invokes OOF (Out Of Frame) algorithm for blastx.
| Table 4.40 |
| Option | -t |
| Function | Length of the largest intron allowed in tblastn for linking HSPs |
| Default | 0 |
| Input Format | [Integer] |
| Example | To allow linking of HSPs 10000 letter apart, use: -t 10000 |
Note: Zero disables linking. Otherwise, the value specified will be used. |