Frequently Asked questions.
Frequently Asked questions
Q: What happened to the Month database?
- 2007/06/30:2007/07/31[mdat] (mdat = modification date)
- 1 month[filter]
- 2 months[filter]
- 6 months[filter]
Q: What are the lower case grey letters in the query sequence in BLAST results?
Q: Submitting primers or other short sequences
Primer-BLAST was designed to make primers that are specific to an input PCR template, using Primer3. It can also check user supplied primers for specificity.
The "Search for short, nearly exact matches" nucleotide and protein pages no longer exist. Instead, the nucleotide and protein blast programs automatically check for short queries and adjust the search parameters accordingly. This adjustment occurs when the query, either nucleotide or amino acid, is of length 30 or less. The translating blast programs or searches on the genome blast pages do not have this auto adjust feature.
Q: Default database for nucleotide-nucleotide searches
Q: Saving your search parameters
Q: How to limit a search to an organism or taxonomic group or exclude such groups
To search only sequences from an organism or taxonomic group, use the "Organism" text box. On the nucleotide blast pages, first click the radio button for "Others (nr etc.)". The "Organism" text box has an auto fill function. Begin to enter an organism common name (rat, bacteria, etc.), a genus or species (elegans, danio, etc.), or an NCBI taxonomy id; then select a name from the list.
The taxonomic group can also be excluded by using the "Exclude" checkbox to the right of the "Organism" box.
More taxonomic groups may be included or excluded wth the "+" box further to the right of the "Organism" text box.
You can also use Entrez Query terms as before. Put those in the Entrez Query box just below the Organism field; for example, rattus norvegicus[organism] or simply, rat[orgn]. Also, see the FAQ, "How to limit a search to a subset of database sequences."
You can search for taxa in the Taxonomy Browser.
Q: How to exclude models (XM/XP accessions) and uncultured enviromental sequences?
Q: How to limit a search to a subset of database sequences?
- to search against mammals other than human, use: mammals[orgn] NOT human[orgn]
- to exclude all mammals, use: all[filter] NOT mammals[orgn]
- to search against all records that contain "phosphorylase" in the title (definition line) of the record, use: phosphorylase[title].
Q: How can I search a batch of sequences with BLAST?
- 1) Web megablast. This program is optimized for aligning nucleotide sequences that differ slightly as a result of sequencing or other similar "errors", and is good for scanning a large number of EST type sequences (about 500 kb in length) against a large database. You can import a file of EST sequences in FASTA format or as a list of GenBank accessions or GIs. The default output is an easily reviewable Hit Table format, although you can download and save the results in Standard pairwise HTML or any of the other result output options. Web megablast is available from the BLAST home page. Megablast is also part of the Standalone BLAST executables and an option in the Network BLAST client (see below).
- 2) Standalone BLAST executables. These are command line programs which run BLAST searches against local, downloaded copies of the NCBI BLAST databases, or against custom databases formatted for BLAST. The programs will handle either a single large file with multiple FASTA query sequences, or you can create a script to send multiple files one at a time. The executables are available for a wide variety of platforms, including many "flavors" of UNIX (LINUX, Solaris, etc.), Windows, and Mac OSX.
The Standalone package can be downloaded at http://www.ncbi.nlm.nih.gov/blast/download.shtml or the anonymous FTP location, ftp://ftp.ncbi.nih.gov/blast/executables/; get the "blast" package for your platform.
- 3) Network BLAST client (also called netblast and blastcl3). The Network client is a simple commandline program that allows you to submit a single file of FASTA sequences over an internet connection to the NCBI BLAST databases. You submit searches through the client to the NCBI servers and do not need to download the databases locally. There are client versions for various UNIX platforms, Windows, and Mac OSX.
Q: How to write a program to submit jobs to NCBI's BLAST servers
Q: How to use BLAST to align two sequences without a database search.
Q: What is the Expect (E) value?
The Expect value (E) is a parameter that describes the number of hits one can "expect" to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases. Essentially, the E value describes the random background noise. For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance.
The lower the E-value, or the closer it is to zero, the more "significant" the match is. However, keep in mind that virtually identical short alignments have relatively high E values. This is because the calculation of the E value takes into account the length of the query sequence. These high E values make sense because shorter sequences have a higher probability of occurring in the database purely by chance. For more details please see the calculations in the BLAST Course.
The Expect value can also be used as a convenient way to create a significance threshold for reporting results. You can change the Expect value threshold on most BLAST search pages. When the Expect value is increased from the default value of 10, a larger list with more low-scoring hits can be reported.
What is "low-complexity" sequence?
Regions with low-complexity sequence have an unusual composition that can create problems in sequence similarity searching. For amino acid queries this compositional bias is determined by the SEG program (Wootton and Federhen, 1996). For nucleotide queries it is determined by the DustMasker program (Morgulis, et al., 2006).
Low-complexity sequence can often be recognized by visual inspection. For example, the protein sequence PPCDPPPPPKDKKKKDDGPP has low complexity and so does the nucleotide sequence AAATAAAAAAAATAAAAAAT. Filters are used to remove low-complexity sequence because it can cause artifactual hits.
In BLAST searches performed without a filter, high scoring hits may be reported only because of the presence of a low-complexity region. Most often, it is inappropriate to consider this type of match as the result of shared homology. Rather, it is as if the low-complexity region is "sticky" and is pulling out many sequences that are not truly related.
How to filter out (organism-specific) interspersed repeats?
ERROR: "No significant similarity found"
- Short query sequences: Short alignments may have Expect values above the default threshold, which is 10 on most pages, and, therefore, are not displayed. Try increasing the Expect threshold (under 'Algorithm parameters'). Also, see the FAQ Submitting primers or other short sequences.
- Filtering: Some of the BLAST programs mask regions of low complexity by default. These regions are not allowed to initiate alignments, so if your query is largely low complexity, the filter may prevent all hits to the database. On the Basic BLAST pages, adjust the filter settings in the section 'Filters and Masking', under 'Algorithm parameters'. For a description of low complexity filters, see "What is low-complexity sequence?"
ERROR: An error has occurred on the server, Too many HSPs to save all
- 1.) Enable species specific repeats if applicable, see How to filter out (organism-specific) interspersed repeats.
- 2) If using tblastx, try blastx instead. The tblastx program is very CPU intensive as it not only translates the query in six reading frames but every database sequence as well. Often, using tblastx is a measure of last resort; a blastx search against a database of known proteins may provide what you need.
- 3) Search a smaller database, such as refseq_rna. Larger databases obviously contain more sequences and for some queries this results in numerous "background" hits. If you want a database of known mRNAs (and their translations) then refseq_rna is a good choice.
- 4) Break up large queries into smaller pieces; submit each piece in a separate search. A common cause of errors in BLAST is searching with a huge sequence, like a complete chromosome, against a large database like nr. This is better accomplished in portions rather than one large, continuous sequence.
- 5) Limit the database by taxonomy. Start with large groups, such as mammals, bacteria, etc. Any taxonomic node or tax id number that you can find in the Taxonomy browser can be used in the 'Organism' text box; see the BLAST FAQ, How to limit a search to an organism or taxonomic group." Also see the Taxonomy browser.
- 5) You may be hitting a large number of 'PREDICTED' or 'hypothetical protein' records. If you do not want these hits, use an Entrez Query such as: all[filter] NOT predicted[title].
- 6) For megablast and blastn searches, try increasing the word size and/or decreasing the Expect threshold.
ERROR: An error has occurred on the server, [blastsrv4.REAL]:Error: CPU usage limit was exceeded, resulting in SIGXCPU (24).
If you get this error you have numerous options depending on your goals. See the BLAST FAQ, "ERROR: An error has occurred on the server, Too many HSPs to save all".
Why do I get the message "ERROR:BLASTSetUpSearch: Unable to calculate Karlin-Altschul params, check query sequence" ?
Why some batch searches on the web may seem to take longer than expected.
|1st request: current time|
|2nd request: current time + 60 seconds|
|3rd request: current time + 120 seconds|
|4th request: current time + 180 seconds|
|5th request: current time + 240 seconds|
The BLAST server works through requests in the order of earliest to latest TOE. A query will be executed before it's TOE, if there are no other queries with an earlier TOE. Users with large numbers of queries are encouraged to use the BLAST servers at off-peaks hours, which are from 8 p.m. to 8 a.m. (EST).