NCBI LogoNCBI News

In this issue

The Human
Genome Sequence

BLink Enhances
Entrez
Exploration

Human Gene
Nomenclature

FAQs

Recent Publications

Standalone
BLAST Additions

BLAST Lab

Mirror FTP Site
for GenBank

Masthead


Standalone BLAST Incorporates MegaBLAST‚ RPS-BLAST‚ and BLASTClust


With release 2.1.2‚ NCBI’s Standalone BLAST package now contains three new programs‚ MegaBLAST‚ RPS-BLAST‚ and BLASTClust. These programs supplement the standard distribution‚ including the basic BLAST program‚ blastall‚ blastpgp‚ PSI- and PHI-BLAST‚ bl2seq (a standalone version of BLAST2Sequences)‚ and formatdb (the program used to create BLAST-ready databases).

MegaBLAST uses an algorithm developed by Webb Miller et al. [1]‚ that is designed to swiftly compare two large sets of nucleotide sequences that differ only slightly from one another‚ perhaps as a result of sequencing errors. Mega-BLAST is about 10 times faster than BLAST and is used by NCBI to assemble the clusters that comprise UniGene.

BLASTClust uses a single-linkage clustering process to group protein sequences based on pairwise matches found using the BLAST algorithm. The program accepts as input a file of concatenated protein sequences in FASTA format‚ each with a unique sequence identifer. It returns a file of sequence identifiers arranged in clusters.

RPS-BLAST uses a protein sequence query to search a library of PSI-BLAST Position Specific Score Matrices (PSSMs). Also included with the package are the programs “makemat” and “copymat”. These programs are required to create the RPS-BLAST PSSM libraries. Partially processed libraries‚ based on the protein domains in the PFAM and SMART databases‚ are available from NCBI via FTP at ftp://ncbi.nlm.nih.gov/pub/mmdb/cdd/.


Additional Enhancements

In order to calculate more accurate Expect-values‚ BLAST and PSI-BLAST now take into account the amino acid composition of individual database sequences. The new Expect-value calculations use a scaling procedure [2‚3] that creates a composition-corrected scoring system for each individual database sequence. Hence‚ identical alignments may receive different scores depending upon the amino acid composition of the database sequences involved. An option to generate XML output is also provided (using the command line option “-m 7”). Pick up Standalone BLAST 2.1.2 in the “executables” directory at ftp://ncbi.nlm.nih.gov/blast. —DW‚ SM




References
1. Zhang‚ Z‚ S Schwartz‚ L Wagner‚ and W Miller. J Comput Biol 7:203-14‚ 2000.

2. Altschul‚ SF‚ et al. Nucl Acids Res 25:3389-3402‚ 1997.
3. Schäffer‚ AA‚ et al. Bioinformatics 15:1000-1011‚ 1999.



Continue


NCBI News | Fall/Winter 2000