How to Maintain Current Local Databases
Because of the enormous size of the sequence data in GenBank, searching with very large queries or with huge batches of smaller sequences is difficult. The Web interface designed by NCBI to handle general purpose searches cannot be used for batch searching and should not be used for searches that are expected to take a very large amount of CPU time. Although one solution for power-users is to set up standalone BLAST and perform computationally intensive searches locally, the problem of downloading GenBank for local searches then materializes. To help with this problem, NCBI offers a set of standard BLAST databases in FASTA format, and of sizes far smaller than the corresponding full GenBank records, on the NCBI BLAST FTP site.
These BLAST FASTA files are updated as often as NCBI updates the databases used by the BLAST servers (nightly for nr, nt, est, sts, month.na, month.aa, and gss). Nightly transfers of the entire nr or nt database, however, can be quite a burden on users and NCBI systems and is not necessary, especially since most of the nr and nt database does not change on a nightly basis. NCBI makes a program available to merge new entries, from an update file, into the non-redundant databases (nr or nt). This program, fmerge, can be used to update the nr databases with the following two-step procedure:
1. FTP the newest nr database and run fmerge in create mode on it:
fmerge -t 1 -n nr -i index.nr
|
2. Periodically FTP the newest month.aa file that contains only
the new sequences released by GenBank in the last 30 days, and run fmerge in update mode on it:
| fmerge -t 2 -m month.aa -i index.nr |
A similar procedure can be used for the nt database for nucleotide sequences. In that case, the database would be updated using the month.na file, which is analogous to the month.aa file used for the
nr database.
The fmerge program does not remove redundancy of new sequences from the database and, since month.na contains est and sts sequences (which nt does not), the local nt or nr will differ slightly from NCBIs version after an update to the appropriate month database. This divergence can be kept to a minimum by refreshing the non-redundant database (nr or nt) and removing the index file once a month.
The fmerge program is available by anonymous FTP from ftp://ncbi.nlm.nih.gov/blast/fmerge/.
The BLAST databases are available at ftp://ncbi.nlm.nih.gov/blast/db/. SM, DW
The BLAST Lab feature is intended to provide detailed technical information on some of the more specialized uses of the BLAST family of programs. Topics are selected from the range of questions received by the BLAST Help Group.
|