- Donwloading
- Installation and Setup
- Execution
- Technical Assistance
1. Downloading
NCBI provides command line standalone BLAST programs as a single compressed package. The package is
available as blast-initialed archives for a variety of computer platforms, or hardware/operating system combinations,
from NCBI ftp site under:
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/
The archives for Unix, Linux, or MacOSX compressed tar archive with .tar.gz file extension.
These archives are named in the following convention:
blast-#.#.#-CHIP-OS.tar.gz
|
Here #.#.# represent the version number of current release. If a patched version is made available in between
official releases, the #.#.# will be the date when the patch was created. The CHIP indicates the chipset, and OS
the operating system. The archives and their target platforms are listed in the table below.
| Table 1.1 Command Line BLAST Archives and Their Target
Platform |
| Archive Name | Target Platform | Instruction Set |
| blast-#.#.#-axp64-tru64.tar.gz | HP Alpha with tru64 OS | Big endian |
| blast-#.#.#-ia32-freebsd.tar.gz | Pentium Compatible PC with FreeBSD OS | Little endian |
| blast-#.#.#-ia32-linux.tar.gz | Pentium Compatible PC with Linux OS | Little endian |
| blast-#.#.#-ia32-solaris*.tar.gz | Pentium Compatible PC with Solaris OS | Little endian |
| blast-#.#.#-ia64-linux.tar.gz | Itanium PC with 64-bit Linux OS | Little endian |
| blast-#.#.#-mips64-irix.tar.gz | 64 bits MIPS processor with IRIX OS | Big endian |
| blast-#.#.#-ppc64-aix.tar.gz | 64 bits PowerPC running IBM AIX OS | Big endian |
| blast-#.#.#-universal-macosx.tar.gz | ia32 and PowerPC32 running Max OSX | Big endian (ppc) Little
endian (ia) |
| blast-#.#.#-sparc64-solaris.tar.gz | 64 bits Sparc processor wiht Solaris10 | Big endian |
| blast-#.#.#-sparc64-solaris8.tar.gz | 64 bits Sparc processor wiht Solaris8 | Big endian |
| blast-#.#.#-x64-linux.tar.gz | X64(Amd64/em64) running 64bit Linux OS | Little endian |
| blast-#.#.#-x64-solaris.tar.gz | X64(Amd64/em64) running Solaris OS | Little endian |
Note: rpsblast databases are platform dependent.
We can download the archive for our platform in binary mode through ftp client, web browser, or other tools.
NCBI does not provide blast archives for custom platforms. Instead, users will need to compile the NCBI toolkit
to get the BLAST binary programs. NCBI Toolkit release can be downloaded from:
ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/CURRENT/ncbi.tar.gz
Users with questions on recompliation from toolkit should address their questions to:
toolbox@ncbi.nlm.nih.gov
2. Installation and Setup
To install, first place the downloaded archive under a desired directory (generally the home directory of the user).
Inflate the archive using gunzip followed by extraction using tar.
gunzip -d blast-#.#.#-x64-linux.tar.gz
tar zxvpf blast-#.#.#-x64-linux.tar
|
For newer version of tar, the following command also works:
tar zxvpf blast-#.#.#-x64-linux.tar.gz
|
Successful extraction will regenerate a blast-#.#.# directory, which contains
three subdirectories (bin, doc, and data) and a VERSION file . The bin subdiretories contains the programs listed
below.
| Table 2.1 Programs contained in blast ftp archive and
their functions |
| Program | Function |
| bl2seq | Directly comparing two FASTA sequences |
| blastall | traditional blast with blastn, blastp, blastx, tblastn, and tblastx function |
| blastclust | clustering in input FASTA sequences |
| blastpgp | standalone PSI-BLAST for search of distantly related protein sequences and generate
position-specific matrices |
| copymat | copies old blastpgp output for input to makemat |
| fastacmd | for sequence retrieval or dump from a formatted blast database |
| formatdb | Convert FASTA formatted seqeucne file into BLAST database |
| formatrpsdb | Format scoremat files into an RPSBLAST database |
| impala | protein profile search program, mostly replaced by rpsblast |
| makemat | convert the copymat files into scoremat format, no loger needed by new blastpgp
output |
| megablast | faster batch blastn program that uses greedy-algorithm. Works in contiguous or more
sensitive discontiguous modes |
| rpsblast | reverse PSI-BLAST program for searching against conserved domain database |
| seedtop | Pattern search program |
Documents for individual programs are in the doc subdirectory. The data subdirectory contains matrices
for scoring the protein alignments along with files files needed by other NCBI programs. To facilitate
the management of database files, we should also create a subidrectory named db under blast-#.#.# to house
the blast databases by cd to the blast-#.#.# directory and type mkdir db.
MacOSX users will need to launch a terminal window by double clicking on the Terminal icon, which is generally under
the Utilities folder. One can also locate this by searching within Sherlock or MacHelp.
To ensure the smooth execution of blast programs, we should set up a BLAST configuration file,
named .ncbirc to provide the path information for the data and db directories. If we
place the blast-#.#.# directory under the home directory of j_smith, we can specify the
path to data and db directories using lines below.
[NCBI]
DATA=/home/j_smith/data
[BLAST]
BLASTDB=/home/j_smith/db
|
We need to place the resulted .ncbirc file under the home directory of j_smith. If we move data and/or db directories
somewhere else, we will need to change the path specification in .ncbirc to reflect the change.
Upon start, BLAST will read this file to get the path information it needs during BLAST searches. Without this file,
BLAST will search the working directory, or where the command is issued, to try to locate those two files. Failure to
locate the data directory may result in error messages (given below) during protein searches.
[NULL_Caption] WARNING: [000.000] "query name" :Unable to open BLOSUM62
[NULL_Caption] WARNING: [000.000] "query name" :BlastScoreBlkMatFill
returned non-zero status
[NULL_Caption] WARNING: [000.000] "query name" :SetUpBlastSearch failed.
|
Since version 2.2.14, matrix files were hardcoded in blastall. The make the external matrix file unecessary. However,
protein alignment using the old blast engine (-V T) will still need these files.
If we want to be able to call blast program from any directory under /home/j_smith, we will need to change the $PATH
environment variable by terminating it with colon (:) and append the path to /home/j_smith/blast-#.#.#/bin to it.
We can see the content of this variable using echo as shown by the exmaple below.
echo $PATH
/usr/X11R6/bin:/usr/bin:/bin:/usr/local/bin:/opt/local/bin:/home/j_smith/bin
|
Consult your Unix/Linux system administrator on how to modify this variable for your system.
3. Execution
With the above setup, we will be able to call the programs using their name from any directory
in the computer without specifying the path within the command line. For example, to simply see
all the program parameters or switches of a blast program, we can type the program name followed
by a space and a dash, and hit return. This will make the blast program print its parameters,
with simple explanation, to the screen. Sample command and partial output for blastall is given below.
blastall -
blastall 2.2.15 arguments:
-p Program Name [String]
-d Database [String]
default = nr
-i Query File [File In]
default = stdin
-e Expectation value (E) [Real]
default = 10.0
-m alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
...
|
To call blastall, use its blastn subprogram, search against refseq_rna database, fasta_query.txt file as query,
and save save the result in output.txt, we will use this command line:
blastall -p blastn -d refseq_rna -i fasta_query.txt -o ouput.txt |
If you cannot modifiy the $PATH variable. You will need to call the program with explicit path information prefixed before
the program call. For example, to call blast from /home/j_smith/blast-2.2.13/, we will need to use:
./ instruct shell to search for the program under current directory.
./bin/ instructs shell to look at the bin subdirectory under the current directory.
To further customize the search, we can manipulate the relevant search parameters by referring
to the parameter list in Section 3 of this file:
http:/www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall
A sample commandline to search a query mRNA named query_0720.txt against refseq_rna database using blastn is given below.
blastall -d query_0720.txt -d refseq_rna -p blastn ...
|
Note, refseq_rna is a NCBI provided database. It is available from the db subdirectory in preformatted form:
ftp://ftp.ncbi.nih.gov/blast/db
For more information, please see:
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastdb.html
4. Technical Assistance
For questions, feedbacks, and technical assistance, please contact blast-help via email:
blast-help@ncbi.nlm.nih.gov
For questions on other NCBI resources, please write to:
info@ncbi.nlm.nih.gov
|