NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

BLAST® Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2008-.

Cover of BLAST® Help

BLAST® Help [Internet].

Show details

Standalone BLAST Setup for Unix

, Ph.D.

Created: ; Last Update: April 18, 2014.

Introduction

NCBI provides command line standalone blast+ programs (based on the NCBI C++ toolkit) as a single compressed package. The package is available as ncbi-initialed archives for a variety of computer platforms (hardware/operating system combinations) at:

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/

The archives for Linux and Mac OSX are gzip-compressed tar files named using the following convention:

ncbi-blast-#.#.#+-CHIP-OS.tar.gz

Here, the #.#.# represents the version number of the current release, CHIP indicates the chipset, and OS indicates the operating system. Equivalent .rpm and .dmg files for Linux and Mac OSX are also available. These archives and their target platforms are listed in the table below.

Table 1

Executable blast+ package available from NCBI

Archive NameChipsetOSFile Type
ncbi-blast-#.#.#+-ia32-linux.tar.gzPentium chipLinux, 32 bitgzip'd tar archive
ncbi-blast-#.#.#+-universal-macosx.tar.gzppc/intelMac OSXgzip'd tar archive
ncbi-blast-#.#.#+-x64-linux.tar.gzX64 chipLinux, 64 bitgzip'd tar archive
ncbi-blast-#.#.#+.dmgppc/intelMac OSXgzip'd disk image
ncbi-blast-#.#.#+-1.i686.rpmPentium chip Linux, 32 bitrpm
ncbi-blast-#.#.#+-1.x86_64.rpmX64 chipLinux, 64 bitrpm
Note: rpsblast databases are platform dependent.

Installation process from the disk image (.dmg) for Mac OSX and the Red Hat Package Manger (.rpm) for Linux are different and will not be discussed here. The installation of legacy BLAST package based on NCBI C-toolkit (deprecated with version 2.2.26 as its last release) will be described briefly at the end of this tutorial.

Downloading

The blast+ packages for various platforms should be downloaded through anonymous ftp using an ftp client, or other tools such as a web browser, wget, curl, etc. The example working session below demonstrates an ftp download process using the traditional ftp client in a Linux environment. In Mac OSX, a similar command line interface is available through the terminal utility, which is generally under the Utilities folder.

Steps

Steps to download the package through a browser are described below.

  • Point a browser to this ftp directory:
    ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/
  • Right click on a desired archive and select "Save link as…" from the popup menu
  • In the prompt, switch to a desired directory (folder) and click the "Save" button to save the archive to a desired location on the local disk

Example

Downloading through an ftp client is shown below with input commands underlined. For downloading through browsers, refer to Figures 1a and 1b in the PC setup document.

$ ftp ftp.ncbi.nlm.nih.gov
Connected to ftp.wip.ncbi.nlm.nih.gov.
220-
Warning Notice!
This is a U.S. Government computer system, which may be accessed and used 
[ ... extra warning message removed ... ]
There is no right of privacy in this system.
---
Welcome to the NCBI ftp server! The official anonymous access URL is ftp://ftp.ncbi.nih.gov 
Public data may be downloaded by logging in as "anonymous" using your E-mail
address as a password.
Please see ftp://ftp.ncbi.nih.gov/README.ftp for hints on large file transfers
220 FTP Server ready.
Name (ftp.ncbi.nlm.nih.gov:tao): anonymous
331 Anonymous login ok, send your complete email address as your password.
Password: [note: enter your email address at this prompt]
230-Anonymous access granted, restrictions apply.
Please read the file README.ftp
230    it was last modified on Fri Mar 28 14:05:45 2008 - 716 days ago
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd blast/executables/LATEST/
250 CWD command successful
ftp> bin
200 Type set to I
ftp> get ncbi-blast-2.2.29+-x64-linux.tar.gz
local: ncbi-blast-2.2.29+-x64-linux.tar.gz remote: ncbi-blast-2.2.29+-x64-linux.tar.gz
227 Entering Passive Mode (130,14,29,30,215,39)
150 Opening BINARY mode data connection for ncbi-blast-2.2.29+-x64-linux.tar.gz (158357911 bytes)
226 Transfer complete
158357911 bytes received in 2.88 secs (54996.76 Kbytes/sec)
ftp> bye
221 Goodbye.
$

For platforms lacking a precompiled blast+ package, users will need to compile from the BLAST source code. The source code file, "ncbi-blast-#.#.#+-src" in either zip or gziped tar format, is available from the same ftp directory as blast+ packages. Questions and feedbacks on source code compilation should be addressed to:

toolbox@ncbi.nlm.nih.gov 

Installation

To install, simply extract the downloaded package after placing it under a desired directory. This can be accomplished by a single tar command, or a combination of gunzip and tar commands.

$ tar zxvpf ncbi-blast+2.2.29-x64-linux.tar.gz

or

$ gunzip -d ncbi-blast-2.2.29+-x64-linux.tar.gz
$ tar xvpf ncbi-blast-2.2.29+-x64-linux.tar

Successful execution of the above commands installs the package and generates a new ncbi-blast-2.2.29+ directory under the working directory selected. This new directory contains the bin and doc subdirectories, as well as a VERSION file. The bin subdirectory contains the programs listed below.

Table 2

Programs contained in blast+ package

ProgramFunction
blastdbcheckChecks the integrity of a BLAST database
blastdbcmdRetrieves sequences or other information from a BLAST database
blastdb_aliastoolCreates database alias (to tie volumes together for example)
BlastnSearches a nucleotide query against a nucleotide database
blastpSearches a protein query against a protein database
blastxSearches a nucleotide query, dynamically translated in all six frames, against a protein database
blast_formatterFormats a blast result using its assigned request ID (RID) or its saved archive
convert2blastmaskConverts lowercase masking into makeblastdb readable data
deltablastSearches a protein query against a protein database, using a more sensitive algorithm
dustmaskerMasks the low complexity regions in the input nucleotide sequences
legacy_blast.plConverts a legacy blast search command line into blast+ counterpart and execute it
makeblastdbFormats input FASTA file(s) into a BLAST database
makembindexIndexes an existing nucleotide database for use with megablast
makeprofiledbCreates a conserved domain database from a list of input position specific scoring matrix (scoremats) generated by psiblast
psiblastFinds members of a protein family, identifies proteins distantly related to the query, or builds position specific scoring matrix for the query
rpsblastSearches a protein against a conserved domain database to identify functional domains present in the query
rpstblastnSearches a nucleotide query, by dynamically translating it in all six-frames first, against a conserved domain database
segmaskerMasks the low complexity regions in input protein sequences
tblastnSearches a protein query against a nucleotide database dynamically translated in all six frames
tblastxSearches a nucleotide query, dynamically translated in all six frames, against a nucleotide database similarly translated
update_blastdb.plDownloads preformatted blast databases from NCBI
windowmaskerMasks repeats found in input nucleotide sequences

Configuration

Using the blast+ package installed above without configuration could be cumbersome – it requires that extraneous path be prefixed to the program call and database specification since the system does not know where to look for the installed program and the specified database. To streamline BLAST searches, two environment variables, PATH and BLASTDB, need to be modified and specified, respectively, to point to the corresponding directories.

Under bash, the following command appends the path to the new BLAST bin directory to the existing PATH setting:

$ export PATH=”$PATH:$HOME/ncbi-blast-2.2.29+/bin”

The equivalent command under csh is:

$ setenv PATH ${PATH}:/home/tao/ncbi-blast-2.2.29+/bin

The modified $PATH can be examined using echo (added portion underlined):

$ echo $PATH
/usr/X11R6/bin:/usr/bin:/bin:/usr/local/bin:/opt/local/bin:/home/tao/ncbi-blast-2.2.29+/bin

To manage available BLAST databases, a subdirectory named db should be created. For the example installation, the following command creates such directory under ncbi-blast-2.2.29+ directory:

$ mkdir ./ncbi-blast-2.2.29+/db

Similar approaches described above can be used to set the BLASTDB value under bash:

$export BLASTDB=”$HOME/ncbi-blast-2.2.29+/db”

Or under csh to create it anew:

set BLASTDB=”$HOME/ncbi-blast-2.2.29+/db”

A better approach is to have the system automatically set these variables upon login, by modifying the .bash_profile or .cshrc file.

Once they are set, the system knows where to call BLAST programs, and the invoked program will know where to look for the database files. Note that with BLASTDB unspecified, blast+ programs only search the working directory, i.e. the directory where BLAST command is issued.

Database Download

BLAST database is a key component of any BLAST search. To fully test the blast+ package thus installed, a functional database is needed. The following work session demonstrates the process of downloading and installation of the refseq_rna.00.tar.gz, the first volume of the pre-formatted refseq_rna BLAST database from NCBI.

$ cd ncbi-blast-2.2.29+/db
home/tao/ncbi-blast-2.2.29+/db$ ftp ftp.ncbi.nlm.nih.gov
Connected to ftp.wip.ncbi.nlm.nih.gov.
220-
Warning Notice!
[ ... Extra warning message removed for brevity ... ]
Name (ftp.ncbi.nlm.nih.gov:tao): anonymous
331 Anonymous login ok, send your complete email address as your password.
Password: [note: enter your email address at this prompt]
230-Anonymous access granted, restrictions apply.
Please read the file README.ftp
230    it was last modified on Fri Mar 28 14:05:45 2008 - 716 days ago
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd blast/db
250 CWD command successful
ftp> bin
200 Type set to I
ftp> get refseq_rna.00.tar.gz
local: refseq_rna.00.tar.gz remote: refseq_rna.tar.gz
229 Entering Extended Passive Mode (|||50279|)
150 Opening BINARY mode data connection for refseq_rna.tar.gz (857150245 bytes)
100% |*************************************| 817 MB  21.48 MB/s  00:00 ETA
226 Transfer complete.
857150245 bytes received in 00:38 (21.48 MB/s)
ftp> bye
221 Goodbye. 
home/tao/ncbi-blast-2.2.29+/db$ 

Inflating the compressed archive and extracting the tar file will regenerate the files for this database. The example tar command and its output are given below. To save disk space, the refseq_rna.tar.gz file can be removed after the installation.

/home/tao/ncbi-blast-2.2.29+/db$ tar zxvpf refseq_rna.00.tar.gz
refseq_rna.nal
refseq_rna.00.nhr
refseq_rna.00.nin
refseq_rna.00.nnd
refseq_rna.00.nni
refseq_rna.00.nog
refseq_rna.00.nsd
refseq_rna.00.nsi
refseq_rna.00.nsq
home/tao/ncbi-blast-2.2.29+/db$ ls -ltr refseq_rna.00*
-rw-rw-r-- 1 tao sdesk   30757308 Dec 15 19:24 refseq_rna.00.nin
-rw-rw-r-- 1 tao sdesk  999999519 Dec 15 19:24 refseq_rna.00.nsq
-rw-rw-r-- 1 tao sdesk      80420 Dec 15 19:24 refseq_rna.00.nni
-rw-rw-r-- 1 tao sdesk   20575424 Dec 15 19:24 refseq_rna.00.nnd
-rw-rw-r-- 1 tao sdesk  373245118 Dec 15 19:24 refseq_rna.00.nhr
-rw-rw-r-- 1 tao sdesk    2260381 Dec 15 19:25 refseq_rna.00.nsi
-rw-rw-r-- 1 tao sdesk  108664178 Dec 15 19:25 refseq_rna.00.nsd
-rw-rw-r-- 1 tao sdesk   10252432 Dec 15 19:25 refseq_rna.00.nog
-rw-r--r-- 1 tao sdesk 1048449064 Apr 14 16:32 refseq_rna.00.tar.gz
/home/tao/ncbi-blast-2.2.29+/db$ rm refseq_rna.00.tar.gz
/home/tao/ncbi-blast-2.2.29+/db$

The same procedure can be used to download the remaining volumes or other BLAST databases. For regular batch database download/update, take advantage of the update_blastdb.pl script included in the package. This script can automatically download all the volumes of a large database.

Execution and validation

With the above blast+ setup, BLAST programs installed under the "ncbi-blast-2.2.29+/bin" directory can be invoked by name from any directory. Type the command "blastn -help" (without quotes) displays the program parameters of blastn to the console.

$ blastn -help
USAGE
  blastn [-h] [-help] [-import_search_strategy filename]
    [-export_search_strategy filename] [-task task_name] [-db database_name]
    [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
    [-negative_gilist filename] [-entrez_query entrez_query]
    [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
    [-subject subject_input_file] [-subject_loc range] [-query input_file]
    [-out output_file] [-evalue evalue] [-word_size int_value]
    [-gapopen open_penalty] [-gapextend extend_penalty]
    [-perc_identity float_value] [-xdrop_ungap float_value]
    [-xdrop_gap float_value] [-xdrop_gap_final float_value]
    [-searchsp int_value] [-max_hsps int_value] [-sum_statistics]
    [-penalty penalty] [-reward reward] [-no_greedy]
    [-min_raw_gapped_score int_value] [-template_type type]
    [-template_length int_value] [-dust DUST_options]
    [-filtering_db filtering_database]
    [-window_masker_taxid window_masker_taxid]
    [-window_masker_db window_masker_db] [-soft_masking soft_masking]
    [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value]
    [-best_hit_score_edge float_value] [-window_size int_value]
    [-off_diagonal_range int_value] [-use_index boolean] [-index_name string]
    [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines]
    [-outfmt format] [-show_gis] [-num_descriptions int_value]
    [-num_alignments int_value] [-html] [-max_target_seqs num_sequences]
    [-num_threads int_value] [-remote] [-version]

DESCRIPTION
   Nucleotide-Nucleotide BLAST 2.2.29+

OPTIONAL ARGUMENTS
 -h
   Print USAGE and DESCRIPTION;  ignore all other parameters
 -help
   Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters
 -version
   Print version number;  ignore other arguments

 *** Input query options
 -query <File_In>
   Input file name
   Default = `-'
 -query_loc <String>
   Location on the query sequence in 1-based offsets (Format: start-stop)
 -strand <String, `both', `minus', `plus'>
   Query strand(s) to search against database/subject
   Default = `both'
... 
 

Note: For installation without $PATH modification, prefix the path to the program. For example, to execute the same command from /home/tao directory, use the following command instead, where the "./" prefix denotes the current working directory:

/home/tao$ ./ncbi-blast-2.2.29+/bin/blastn –help

Example Execution

The real test of this installation should be example searches. The work session shown below performs the following task:

  • Call blastdbcmd to extract the sequence of nm_000122 from the installed database (refseq_rna.00) to a text file (test_query.txt)
  • Run a test blastn search using the sequence in test_query.txt as query against refseq_rna.00 database
$ blastdbcmd -db refseq_rna.00 -entry nm_000122 -out test_query.fa
$ blastn -query test_query.fa -db refseq_rna.00 -task blastn -dust no -outfmt "7 qseqid sseqid evalue bitscore" -max_target_seqs 2
# BLASTN 2.2.29+
# Query: gi|263191547|ref|NM_000122.3| Homo sapiens mutL homolog 1 (MLH1), transcript variant 1, mRNA
# Database: refseq_rna.00
# Fields: query id, subject id, evalue, bit score
# 2 hits found
gi|263191547|ref|NM_000122.3|   gi|263191547|ref|NM_000122.3|   0.0      4801
gi|263191547|ref|NM_000122.3|   gi|332816398|ref|XM_001170433.2|        0.0      4758
# BLAST processed 1 queries
$

Note that the command lines and output wrap around.

Setup Steps For Legacy blast

The original standalone BLAST package based on NCBI C-toolkit (legacy blast) is deprecated. The installation of legacy blast package for Windows differs from that for blast+ described above. The key differences are summarized below.

a.

The legacy blast packages are located under a different ftp directory:

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/ 
b.

The packages are named with this convention: blast-#.#.#-CHIP-OS.tar.gz, where #.#.# is the version, CHIP is the chipset, and OS is the operating system

c.

The program names and functions are different (see Table 3 below for details)

d.

Path to the extra data directory must be specified in the DATA environment variable, or in a file called .ncbirc in the format as shown below:

Table 3

Table 3

Programs contained in the legacy blast package

 [NCBI] 
DATA=/path/blast-#.#.#/data

The commands for legacy blast, comparable to those examples given above for blast+, are:

blastall -

fastacmd -d refseq_rna -s nm_000122 -o test_query.fa

blastall -p blastn -i test_query.fa -d refseq_rna -F F -m 9 -b 2 -v 2

Technical Assistance

Questions, feedback, and technical assistance requests should be sent to blast-help at:

blast-help@ncbi.nlm.nih.gov 

Questions on other NCBI resources should be addressed to NCBI Service Desk at:

info@ncbi.nlm.nih.gov 
Copyright Notice. BLAST is a registered Trademark of the National Library of Medicine.
Bookshelf ID: NBK52640
PubReader format: click here to try

Views

Other titles in this collection

Contact us

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...