Setup of Command Line BLAST Under Unix, Linux and MacOSX

Tao Tao, Ph.D.
User Service, NCBI, NLM

TOC
  1. Donwloading
  2. Installation and Setup
  3. Execution
  4. Technical Assistance

1. Downloading

NCBI provides command line standalone BLAST programs as a single compressed package. The package is available as blast-initialed archives for a variety of computer platforms, or hardware/operating system combinations, from NCBI ftp site under:

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/
The archives for Unix, Linux, or MacOSX compressed tar archive with .tar.gz file extension. These archives are named in the following convention:
blast-#.#.#-CHIP-OS.tar.gz
Here #.#.# represent the version number of current release. If a patched version is made available in between official releases, the #.#.# will be the date when the patch was created. The CHIP indicates the chipset, and OS the operating system. The archives and their target platforms are listed in the table below.

Table 1.1 Command Line BLAST Archives and Their Target Platform
Archive NameTarget PlatformInstruction Set
blast-#.#.#-axp64-tru64.tar.gzHP Alpha with tru64 OSBig endian
blast-#.#.#-ia32-freebsd.tar.gzPentium Compatible PC with FreeBSD OSLittle endian
blast-#.#.#-ia32-linux.tar.gzPentium Compatible PC with Linux OS Little endian
blast-#.#.#-ia32-solaris*.tar.gzPentium Compatible PC with Solaris OS Little endian
blast-#.#.#-ia64-linux.tar.gzItanium PC with 64-bit Linux OSLittle endian
blast-#.#.#-mips64-irix.tar.gz64 bits MIPS processor with IRIX OS Big endian
blast-#.#.#-ppc64-aix.tar.gz64 bits PowerPC running IBM AIX OSBig endian
blast-#.#.#-universal-macosx.tar.gzia32 and PowerPC32 running Max OSXBig endian (ppc)
Little endian (ia)
blast-#.#.#-sparc64-solaris.tar.gz64 bits Sparc processor wiht Solaris10Big endian
blast-#.#.#-sparc64-solaris8.tar.gz64 bits Sparc processor wiht Solaris8Big endian
blast-#.#.#-x64-linux.tar.gzX64(Amd64/em64) running 64bit Linux OSLittle endian
blast-#.#.#-x64-solaris.tar.gzX64(Amd64/em64) running Solaris OSLittle endian
Note: rpsblast databases are platform dependent.

We can download the archive for our platform in binary mode through ftp client, web browser, or other tools. NCBI does not provide blast archives for custom platforms. Instead, users will need to compile the NCBI toolkit to get the BLAST binary programs. NCBI Toolkit release can be downloaded from:

ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools/CURRENT/ncbi.tar.gz
Users with questions on recompliation from toolkit should address their questions to:

toolbox@ncbi.nlm.nih.gov

2. Installation and Setup

To install, first place the downloaded archive under a desired directory (generally the home directory of the user). Inflate the archive using gunzip followed by extraction using tar.

gunzip -d blast-#.#.#-x64-linux.tar.gz
tar zxvpf blast-#.#.#-x64-linux.tar
For newer version of tar, the following command also works:
tar zxvpf blast-#.#.#-x64-linux.tar.gz
Successful extraction will regenerate a blast-#.#.# directory, which contains three subdirectories (bin, doc, and data) and a VERSION file . The bin subdiretories contains the programs listed below.

Table 2.1 Programs contained in blast ftp archive and their functions
ProgramFunction
bl2seqDirectly comparing two FASTA sequences
blastalltraditional blast with blastn, blastp, blastx, tblastn, and tblastx function
blastclustclustering in input FASTA sequences
blastpgpstandalone PSI-BLAST for search of distantly related protein sequences and generate position-specific matrices
copymatcopies old blastpgp output for input to makemat
fastacmdfor sequence retrieval or dump from a formatted blast database
formatdbConvert FASTA formatted seqeucne file into BLAST database
formatrpsdbFormat scoremat files into an RPSBLAST database
impalaprotein profile search program, mostly replaced by rpsblast
makematconvert the copymat files into scoremat format, no loger needed by new blastpgp output
megablastfaster batch blastn program that uses greedy-algorithm. Works in contiguous or more sensitive discontiguous modes
rpsblastreverse PSI-BLAST program for searching against conserved domain database
seedtopPattern search program

Documents for individual programs are in the doc subdirectory. The data subdirectory contains matrices for scoring the protein alignments along with files files needed by other NCBI programs. To facilitate the management of database files, we should also create a subidrectory named db under blast-#.#.# to house the blast databases by cd to the blast-#.#.# directory and type mkdir db.

MacOSX users will need to launch a terminal window by double clicking on the Terminal icon, which is generally under the Utilities folder. One can also locate this by searching within Sherlock or MacHelp.

To ensure the smooth execution of blast programs, we should set up a BLAST configuration file, named .ncbirc to provide the path information for the data and db directories. If we place the blast-#.#.# directory under the home directory of j_smith, we can specify the path to data and db directories using lines below.

[NCBI]
DATA=/home/j_smith/data

[BLAST]
BLASTDB=/home/j_smith/db
We need to place the resulted .ncbirc file under the home directory of j_smith. If we move data and/or db directories somewhere else, we will need to change the path specification in .ncbirc to reflect the change.

Upon start, BLAST will read this file to get the path information it needs during BLAST searches. Without this file, BLAST will search the working directory, or where the command is issued, to try to locate those two files. Failure to locate the data directory may result in error messages (given below) during protein searches.

[NULL_Caption] WARNING: [000.000] "query name" :Unable to open BLOSUM62 
[NULL_Caption] WARNING: [000.000] "query name" :BlastScoreBlkMatFill 
    returned non-zero status 
[NULL_Caption] WARNING: [000.000] "query name" :SetUpBlastSearch failed.
Since version 2.2.14, matrix files were hardcoded in blastall. The make the external matrix file unecessary. However, protein alignment using the old blast engine (-V T) will still need these files.

If we want to be able to call blast program from any directory under /home/j_smith, we will need to change the $PATH environment variable by terminating it with colon (:) and append the path to /home/j_smith/blast-#.#.#/bin to it. We can see the content of this variable using echo as shown by the exmaple below.

echo $PATH

/usr/X11R6/bin:/usr/bin:/bin:/usr/local/bin:/opt/local/bin:/home/j_smith/bin
Consult your Unix/Linux system administrator on how to modify this variable for your system.

3. Execution

With the above setup, we will be able to call the programs using their name from any directory in the computer without specifying the path within the command line. For example, to simply see all the program parameters or switches of a blast program, we can type the program name followed by a space and a dash, and hit return. This will make the blast program print its parameters, with simple explanation, to the screen. Sample command and partial output for blastall is given below.

blastall -

blastall 2.2.15   arguments:

  -p  Program Name [String]
  -d  Database [String]
    default = nr
  -i  Query File [File In]
    default = stdin
  -e  Expectation value (E) [Real]
    default = 10.0
  -m  alignment view options:
0 = pairwise,
1 = query-anchored showing identities,
...

To call blastall, use its blastn subprogram, search against refseq_rna database, fasta_query.txt file as query, and save save the result in output.txt, we will use this command line:

blastall -p blastn -d refseq_rna -i fasta_query.txt -o ouput.txt

If you cannot modifiy the $PATH variable. You will need to call the program with explicit path information prefixed before the program call. For example, to call blast from /home/j_smith/blast-2.2.13/, we will need to use:

./bin/blastall -
./ instruct shell to search for the program under current directory. ./bin/ instructs shell to look at the bin subdirectory under the current directory.

To further customize the search, we can manipulate the relevant search parameters by referring to the parameter list in Section 3 of this file:

http:/www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall

A sample commandline to search a query mRNA named query_0720.txt against refseq_rna database using blastn is given below.

blastall -d query_0720.txt -d refseq_rna -p blastn ...
Note, refseq_rna is a NCBI provided database. It is available from the db subdirectory in preformatted form:

ftp://ftp.ncbi.nih.gov/blast/db

For more information, please see:

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastdb.html

4. Technical Assistance

For questions, feedbacks, and technical assistance, please contact blast-help via email:

blast-help@ncbi.nlm.nih.gov

For questions on other NCBI resources, please write to:

info@ncbi.nlm.nih.gov