| MEGABLAST is the tool of choice to identify a
sequence. |
 |
The best way to identify an unknown sequence is to see if
that sequence already exists in a public database. If the
database sequence is a well-characterized sequence, then you
may have access to a wealth of biological information. All of
the nucleotide-nucleotide BLAST programs can be used to
accomplish this goal. However, MEGABLAST is specifically
designed to efficiently find long alignments between very
similar sequences and thus is the best tool to use to find
the identical match to your query sequence. In addition to
the expect value significance cut-off, MEGABLAST also
provides an adjustable percent identity cut-off that
overrides the significance threshold.
NOTE: Web MEGABLAST can also accept batch queries.
Click here for details.
|
| Standard nucleotide BLAST is better at finding
sequences similar, but not identical, to your query. |
 |
The BLAST nucleotide algorithm finds similar sequences by
generating an indexed table or dictionary of short
subsequences called words for both the query and the
database. The program can then rapidly find initial exact
matches to the query words by simply looking up a particular
word in the database dictionary. These initial matches serve
as starting points for longer alignments that are generated
in several steps, ending with a final gapped alignment.
One of the important parameters governing the sensitivity of
BLAST searches is the length of the initial words (word
size). The most important reason that blastn is more
sensitive than MEGABLAST is that it uses a shorter default
word size. Because of this, blastn is better than MEGABLAST
at finding alignments to related nucleotide sequences from
other organisms since the initial exact match can be shorter.
The word size is adjustable in blastn and can be reduced from
the default value of 11 to a minimum of 7 to increase
sensitivity. This word size can also be increased to increase
the search speed and limit the number of database hits. Or
one can use MEGABLAST with a relaxed percent identity cutoff
(default set at 99%).
Nucleotide-nucleotide searches are not the recommended way to
find homologous protein coding regions in other organisms. It
is better to perform searches at the protein level, either
with translations of the nucleotide sequences or by direct
protein-protein BLAST. This is because of the degeneracy of the genetic code, the greater
information available in amino acid sequence, and the more
sophisticated algorithm in protein-protein BLAST.
|
| "Search for short and near exact matches" under
Nucleotide BLAST is useful for primer or short nucleotide
motif searches. |
 |
Short sequences (less than 20 bases) will often not find
any significant matches to the database entries under the
standard nucleotide-nucleotide BLAST settings. The usual
reasons for this are that the significance threshold
governed by the expect value parameter is set too
stringently and the default word size parameter is set too
high.
You can adjust both the word size and the expect value on
the standard BLAST pages to work with short sequences.
However, we do provide a BLAST page with these values
preset to give optimum results with short sequences. This
page ("Search for short and nearly exact matches") is
linked under the nucleotide BLAST section of the main BLAST
page. The adjustments are described in the table
below.
| Program |
Word Size |
Filter
Setting |
Expect Value |
| Standard Nucleotide BLAST |
11 |
On (DUST) |
10 |
| Search for short/near exact
matches |
7 |
Off |
1000 |
A common use of this page is to check the specificity of
primers used in the polymerase chain reaction (PCR) or
hybridization. A useful way to check a pair of PCR primers
is to concatenate them and search them as one sequence. The
forward primer and the reverse primer can simply be pasted
together with a string of ten or more N's between the two
sequences. Since BLAST looks for local alignments and
searches both strands, there is no need to reverse
complement one of the primers before doing the
concatenation or the search.
NOTE: The query sequence should contain no
ambiguous bases. Consensus motifs with degenerate bases will not work for this
type of search.
|
| Standard protein BLAST is designed for protein
searches |
 |
Standard protein-protein BLAST (blastp) is used for both
identifying a query amino acid sequence and for finding
similar sequences in protein databases. Like other BLAST
programs, blastp is designed to find local regions of
similarity. However, when sequence similarity spans the whole
sequence, blastp will report a global alignment, which is the
preferred result for protein identification purposes.
Unlike nucleotide BLAST, there is no comparable MEGABLAST for
protein searches.
|
| PSI-BLAST is designed for more sensitive protein
protein similarity searches. |
 |
Position-Specific Iterated (PSI)-BLAST is the most sensitive
BLAST program, making it useful for finding very distantly
related proteins. Use PSI-BLAST when your standard
protein-protein BLAST search either failed to find
significant hits, or returned hits with descriptions such as
"hypothetical protein" or "similar to..."
The first round of PSI-BLAST is a standard protein-protein
BLAST search. The program builds a position-specific scoring
matrix (PSSM or profile) from an alignment of the sequences
returned with Expect values better (lower) than the inclusion
threshold (0.005 by default). In the second iteration the
PSSM becomes the query in the search. Any new database hits
below the inclusion threshold are included in a new PSSM. The
PSI-BLAST search is said to have converged when no more new
database sequences are added in subsequent iterations.
You can add database hits that fall outside the inclusion
threshold to your PSSM for the next round by checking the box
next to the hit.
You can also save a PSSM created during a PSI-BLAST search of
one database and use it to search a different database. To do
this, change "Alignment" to "PSSM" in a pulldown menu in the
Format section of a "formatting BLAST" page (at any iteration
after the first). Then format the search, copy the resulting
PSSM and paste it into the Options section of a new PSI-BLAST
search page.
|
| PHI-BLAST can do a restricted protein pattern
search. |
 |
Pattern-Hit Initiated (PHI) BLAST is designed to search for
proteins that contain a pattern specified by the user, AND
are similar to the query sequence in the vicinity of the
pattern. This dual requirement is intended to reduce the
number of database hits that contain the pattern, but are
likely to have no true homology to the query.
To run PHI-BLAST, enter your query (which contains one or
more instances of the pattern) into the "Search" box, and
enter your pattern into the "PHI pattern" box in the
"Options" section. Patterns must follow the syntax
conventions of PROSITE. The documentation on Pattern Syntax
is at: http://www.ncbi.nlm.nih.gov/blast/html/PHIsyntax.html.
|
| The protein version of "Search for short nearly
exact matches" is optimized to find matches to a short
peptide sequence. |
 |
A short peptide (10-15mer or less) often will not find any
significant matches to the database under the standard
protein-protein BLAST settings. The usual reasons for this
are that the significance threshold governed by the expect
value parameter is set too stringently and the default word
size parameter is set too high.
To use a short peptide sequence as a query, you could
adjust both the word size and the expect value on the
standard BLAST pages to make it work with short sequences.
However, we provide a separate BLAST page with these values
preset to optimize blastp searches with short query
sequences. This page, "Search for short nearly exact
matches", is available via a link under the Protein BLAST
section of the BLAST home page. In addition to changing the
Expect value cutoff and word size, the more stringent PAM30
scoring matrix replaces the BLOSUM62 matrix. This page also
turns off the composition-based statistics feature in
standard blastp, which takes the amino acid composition of
the query sequence into account when calculating the score
and significance of the alignments. NOTE:
Composition based statistics can have a large effect on
searches using queries with a biased amino acid
composition. By definition, short peptides will have a
biased compositions and should not be used with composition
based statistics.
Due to the requirement that the query needs to be at least
twice the word size, a query shorter than 5 residues is not
recommended even though it can be as short as 4 residues
when the word size is set to 2. In addition, since
ambiguous residues break the query sequence, there should
be no ambiguities in the query
to ensure that the entire sequence can be used as seeds for
initial search. You can also modify the settings on the
"Protein query - Translated db [tblastn]" pages to find
nucleotide matches for a short peptide. A summary of the
settings for short peptide searches is given below:
| Program |
Word Size |
Filter |
E Value |
Composition based
Statistics |
Score Matrix |
| Standard protein BLAST |
3 |
On (SEG) |
10 |
On |
BLOSUM62 |
| Search for short/nearly exact
matches |
2 |
Off |
20000 |
Off |
PAM30 |
|
| The "Nucleotide query - Protein db [blastx]" is
useful for finding similar proteins to those encoded by a
nucleotide query. |
 |
Translated BLAST services are useful when trying to find
homologous proteins to a nucleotide coding region. Blastx
compares the translation of the nucleotide query sequence to
a protein database. Because blastx translates the query
sequence in all six reading frames and provides combined
significance statistics for hits to different frames, it is
particularly useful when the reading frame of the query
sequence is unknown or it contains errors that may lead to
frame shifts or other coding errors. Thus blastx search is
often the first analysis performed with a read from a newly
derived sequence and is used extensively in analyzing EST
sequences.
|
| The "Protein query - Translated db [tblastn]"
search is useful for finding protein homologs in unnannotated
nucleotide data. |
 |
A tblastn search allows you to compare a protein sequence to
the six-frame translations of a nucleotide database. It can
be a very productive way of finding homologous protein coding
regions in unannotated nucleotide sequences such as expressed
sequence tags (ESTs) and draft genome records (HTG), located
in BLAST databases est and htgs, respectively.
ESTs are short, single-read cDNA sequences. These comprise
the largest pool of sequence data for many organisms and
contain portions of transcripts from many uncharacterized
genes. Since ESTs have no annotated coding sequences, there
are no corresponding protein translations in the BLAST
protein databases. Hence a tblastn search is the only way to
search for these potential coding regions at the protein
level. The HTG sequences, draft sequences from various genome
projects or large genomic clones, are another large source of
unannotated coding regions.
Like all translating searches, the tblastn search is
especially suited to working with error prone data like ESTs
and draft genomic sequences from HTG because it combines
BLAST statistics for hits to multiple reading frames and thus
is robust to frame shifts introduced
by sequencing error.
|
| The "Nucleotide query - Translated db [tblastx]" is
useful for identifying novel genes in error prone query
sequence. |
 |
tblastx takes a nucleotide query sequence, translates it in
all six frames, and compares those translations to the
database sequences dynamically translated in all six frames.
This effectively performs a more sensitive blastp search
without doing the manual translation.
tblastx gets around the the potential frame-shift and
ambiguities that may prevent certain open reading frames from
being detected. This is very useful in identifying potential
proteins encoded by single pass read ESTs. In addition, it
would be a good tool for identifying novel genes.
NOTE: This type of search is computationally intensive
and searches with large genomic queries are not recommended.
The best way to do this is to install standalone blast and
perform the search locally. For more information on
standalone blast, please read the document forstandalone BLAST and formatdb.
|
| The Conserved Domain Database
(CDD) search service uses RPS-BLAST to identify conserved
protein domains. |
 |
Reverse Position Specific BLAST (RPS-BLAST) is a more
sensitive way of identifying conserved domains in proteins
than standard BLAST searching. It compares a protein sequence
against a database of position specific scoring matrices
(PSSMs). The PSSMs used in CDD search capture the
substitution frequencies at each position in the multiple
sequence alignments of recognized conserved domains. These
conserved domain alignments are from three protein domain
databases: SMART, PFAM, and LOAD. For additional information,
go to CDD help.
|
| The Conserved Domain
Architecture Retrieval Tool (CDART)
explores the domain architectures of proteins. |
 |
CDART allows you to examine the domain structure of all
proteins in the default BLAST protein database. The CDART
tool first searches a query sequence for the presence of
conserved domains using RPS-BLAST. It then allows you to
retrieve proteins that share one or more protein domains in
common with your query. Because CDART relies on RPS-BLAST,
these searches are more sensitive than ordinary BLAST
searches.
NOTE: If the query does not contain any conserved
domains, CDART will not report any result.
|
| "BLAST 2 Sequences" is designed for direct
comparison of two sequences. |
 |
This program takes two input sequences and compares them
directly. Unlike the other BLAST programs, there is no need
to format the database sequence in any special way. Please
note that "BLAST 2 Sequences" regards the second sequence
as the database. If the database sequence or second query
is present in NCBI databases, using GI/Accession instead of
the FASTA sequence would allow the program to incorporate
the translation and other sequence features, found in that
record, into the final result to make it more
informative.
Since translated BLAST programs are incorporated in this
program, the second sequence can be of different type as
long as an appropriate BLAST program is selected.
Appropriate Query/Program combinations are given in the
table below.
| First sequence |
Second
Sequence |
Program to
Use |
| Nucleotide |
Nucleotide |
blastn or tblastx |
| Nucleotide |
Protein |
blastx |
| Protein |
Nucleotide |
tblastn |
| Protein |
Protein |
blastp |
|
| Human Genome BLAST page is for comparing a query
against the NCBI's assembly of human genome, its derivative
and/or other related databases. |
 |
Like other BLAST search pages in this section, this page
provides a centralized page to access specialized
databases. In this case, the databases are the current NCBI
human genome build and those derived from or related to
it.
All flavors of BLAST, except tblastx, are available with
MEGABLAST set as default. Default filters are DUST and
Human Repeat. The BLAST output links directly to the Human
Genome MapViewer, where database hits can be analyzed in a
genomic context, such as their relationship to other map
elements like ESTs, SNPs, and other predicted genes. The
complete list of databases available for searching are
given below.
| Human Genome Blast
DataBases |
|
| Database |
Content |
genome
(default) |
human genomic contig sequences with
NT_#### accessions |
| mrna |
human RefSeq mrna with NM_#### or
XM_#### accessions |
| protein |
human RefSeq proteins with NP_#### or
XP_#### accessions |
| gscan_mrna |
predicted mRNA sequences generated by
running GenomeScan program on human genomic
contigs |
| gscan_protein |
CDS translations from gscan mrna
set |
| BAC end sequences |
BAC ends from GSS (?) |
| HTGS |
Human entries from GenBank htg
division |
| ESTs |
Human subset from GenBank est
division |
| EST Traces |
Human ests from Trace Archive |
| Other Traces |
Other human entries found in Trace
Archive |
|
| Use Mouse Genome BLAST page to search preliminary
assemblies as well as other mouse sequence databases. |
 |
The organization of this page is similar to that of Human
Genome BLAST page. Note that the double translated BLAST
program, tblastx, is not available on this page due to its
high computational intensity. MEGABLAST is the default
algorithm and both low complexity filtering (DUST) and rodent
repeat filtering are on by default.
The default database "curated NT contigs" is analogous to the
human genome database "genome". However, much less of the
mouse genome has been assembled into contigs. The databases
available for searching are given given in this
page.
|
| The Microbial Genome BLAST page provides
centralized access to complete and unfinished
bacterial/archeal genomes. |
 |
This page provides access to many complete and some
unfinished bacterial/archeal genomes. For a complete list of
genomes in this page, please follow this link.
The primary dataset is the DNA (the genomes), with Protein as
the derivative dataset. Due to the lack of annotation, the
protein dataset may not be available (selectable but with
empty database) for unfinished genomes. One can choose to
search against all the genomes or a selected subsets of them,
and all flavors of BLAST programs are available.
NOTE: BLAST hits to an unfinished genome do not
contain links to GenBank entries since they are not deposited
to GenBank.
|
| Other eukaryotes BLAST page provides access to
genomic sequences to other eukaryotic organisms. |
 |
In addition to human, mouse, and microbial genomes mentioned
above, genomic sequences for many other organisms are also
available. The prominent and high impact genomes are listed
separately and others not list separately are grouped under
this link. The exact sequences available varies depending on
the stage of the sequencing projects.
For list of the organisms represented in the blast database,
please check this
page.
|
| Use the Rat Genome BLAST page to search preliminary
assemblies as well as other rat sequence databases. |
 |
This page provides access to blast databases specific for
rat. Comparing with human and mouse, only limited genomic
sequences are available and there are no assembled contigs.
The contents are explained below.
|
| Content of Rat Genome
Blast Databases |
| Database |
Content |
| HTGS |
Rat phase 0, phase 1, phase 2 or phase
3 sequence. These are the original BAC sequences as
submitted by the sequencing centers. |
| Traces |
All of the raw rat WGS and BAC
traces |
| BAC ends |
The end sequences of BACs from
CHORI-230. Sequenced at
TIGR. |
|
| Reference mRNAs |
Collection of reference mRNAs generated
by the NCBI RefSeq
project. |
| Reference Proteins |
Collection of reference proteins
generated by the NCBI RefSeq
project. |
| ESTs |
Single pass sequence reads from
numerous rat cDNA libraries |
|
| Use the Fugu genome BLAST page to search against
the draft Fugu rubripes (Puffer fish) genome. |
 |
This page provides access to the draft genome and the protein
translation of Fugu rubripes (Japanese Puffer fish). This
genome assembly is provided by the DOE's Joint Genome
Institute. For details on the databases and its release
policy, please go to JGI's Fugu site.
Similar BLAST searches against this genome assembly can also
be done there.
|
| Use the Zebrafish Genome BLAST page to search
against Zebrafish specific sequences. |
 |
Currently there are not finished genomic contigs for this
organism and the content of available databases is explained
below.
|
| Content of Zebrafish
Genome Blast Databases |
| Database |
Content |
| mRNAs |
Zebrafish mRNAs in GenBank. |
| ESTs |
Single pass sequence reads from
numerous Zebrafish cDNA libraries. |
| HTGS |
Zebrafish phase 0, phase 1, phase 2 or
phase 3 sequence. These are the original BAC sequences
as submitted by the sequencing centers. |
| Traces |
All of the raw Zebrafish WGS and BAC
and EST Traces. |
| Reference mRNAs |
Collection of reference mRNAs generated
by the NCBI RefSeq
project. |
| Reference Proteins |
Collection of reference proteins
generated by the NCBI RefSeq
project. |
|
| Use the Arabidopsis thaliana genome BLAST
page to search against the Arabidopsis genome. |
 |
This page provides access to the sequenced chromosome clones
of Arabidopsis thaliana, mRNA sequences predicted from
them, and the translations of those mRNA. Links to the genome
mapviewer are also provided for the identified hits. Direct
searches with text terms can be done in that
Arabidopsis thalianagenome mapviewer page.
|
| Use Oryza sativa genome BLAST page to search
against the rice genome. |
 |
This page provides access to the super contig assemblies of
rice. The data available is from a publicly funded Chinese
rice geneome project and the sequence is from the Oryza
sativa L. ssp. indica strain. For more details, please
refer to the
Rice Genome MapViewer page.
|
| Use the Anopheles gambiae genome BLAST page
to search against the mosquito genome. |
 |
This page provides access to the genome scaffold of
Anopheles gambiae. The data available are from a NIAID
publicly funded project. The sequencing and assembly were
done by Celera. For more details, please refer to the
Anopheles gambiae Genome MapViewer page.
|
| The VecScreen page is for identifying vector
sequence contamination in a query sequence. |
 |
VecScreen is a rapid screening tool that checks the query
sequence against a non-redundant vector database, UniVec,
which contains one copy of every unique sequence segment from
a large number of vectors. In addition, UniVec contains
sequences for adapters, linkers and primers that are commonly
used in the cloning of cDNA or genomic DNA. Detailed
information on UniVec is at: http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html.
This page is generally used to screen for vector
contamination in the sequence before the sequence is being
submitted to public sequence database.
|
| Use the Trace Archive BLAST page to search raw,
unassembled and unannotated primary sequence trace
files. |
 |
Trace data files are a rich source of information, especially
for organisms lacking a significant amount of assembled
genomic sequence. The sequences come from a variety of
projects and sequencing strategies, including Whole Genome
Shotgun (WGS), BAC end sequencing, and EST sequencing. The
trace data are single pass sequencing reads not trimmed for
quality or vector sequences. Their average lengths are
between 500 to 700 bp.
A search from the Trace Archive BLAST page uses MEGABLAST
exclusively and offers the same user-selected options as on
the MEGABLAST web page. Information on the Trace data is
available from this page.
|
| Web MEGABLAST can accept batch queries.
|
 |
MEGABLAST is the only BLAST web service that can accept
multiple queries. There are two ways to enter batch queries
in MEGABLAST. If the query sequences are not present in the
NCBI Entrez system, those sequences need to be pasted in the
search box in FASTA format, one after another with no blank
lines in between sequences. The FASTA definition line (or
title) of each sequence should be on a single line all by
itself. Alternatively, if those sequences are already in a
text file in proper format, the file can be uploaded using
the "Browse" button. An example query file with multiple
sequences is given below.
>Sequence_1
AGACAGATCACTTCAGTCGCCACAATTAGCCATGGATAAGATACACCATTGCCATC
>Sequence_2
AGACAACTTCAGTCGCCGATCACTCGCCACAATTTCAGTCGCCATAAGGCAATTAT
If the query sequences are already present in Entrez, their
GI or Accession numbers can be pasted in the search box, one
identifier per line.
U12345
F12564
BH023812
A text file containing those numbers in this format can be
uploaded through the "Browse" button rather than
copy/paste.
|
| Degenerate bases and ambiguity codes are treated as
mismatches by BLAST. |
 |
Uncertainties in a nucleotide sequence can be represented
by a standard set of single-letter ambiguity codes given in
the table below.
| Code |
Meaning (Base) |
Code |
Meaning (Base) |
| A |
adenosine (A) |
M |
amino (A or C) |
| C |
cytidine (C) |
S |
strong (G or C) |
| G |
guanine (G) |
W |
weak (A or T) |
| T |
thymidine (T) |
B |
not A (G or T or C) |
| U |
uridine (U) |
D |
not C G or A or T) |
| R |
purine (G or A) |
H |
not G (A or C or T) |
| Y |
pyrimidine (T or C) |
V |
not T (G or C or A) |
| K |
keto (G or T) |
N |
any base (A or G or C or T) |
| - |
gap(s) (none) |
|
These are often used to represent degenerate bases in the
third position of codons in degenerate oligonucleotide
primers, or in a less conserved region of a sequence motif.
Although this alphabet is accepted by BLAST, the BLAST
program treats such ambiguities as mismatches in alignment.
In short queries, such as primer sequences, these ambiguous
bases may prevent BLAST from finding any matches in the
database that are as large as the word size. Another side
effect of too many ambiguities is that blastn may interpret
your query sequence as protein and give an error message.
NOTE: dashes (-) in the
query are not accepted. Web blast programs will strip them
before submitting the search. If gaps are desired, use N's
instead of dashes.
For those
programs that use amino acid query sequences (BLASTP and
TBLASTN), the IUPAC based amino acid codes are given in the
table below.
| Code |
Residue |
Code |
Residue |
| A |
alanine |
P |
proline |
| B |
aspartate or asparagine |
Q |
glutamine |
| C |
cysteine |
R |
arginine |
| D |
aspartate |
S |
serine |
| E |
glutamate |
T |
threonine |
| F |
phenylalanine |
U |
selenocysteine |
| G |
glycine |
V |
valine |
| H |
histidine |
W |
tryptophan |
| I |
isoleucine |
Y |
tyrosine |
| K |
lysine |
Z |
glutamate or glutamine |
| L |
leucine |
X |
any residue |
| M |
methionine |
* |
translation stop |
| N |
asparagine |
- |
gap of indeterminate
length |
Blastp treats the red
colored (non-standard) codes as mismatches in alignment.
Web blast programs regard dashes
(-) as illegal characters and will remove them
before starting the search. For U's present in the query,
Web BLAST will replace them with X, before submit the
query. NOTE: If the presence of gaps is desired,
use a string of X's instead of dashes.
|
Peptide sequence database content
|
 |
The content of the peptide sequence databases available for
BLAST searches is described below.
| Database |
Content |
| nr |
All non-redundant GenBank CDS
translations +PDB+SwissProt+PIR+PRF. |
| swissprot |
Last major release of the SWISS-PROT
protein sequence database (no incremetnal
updates). |
| pat |
Proteins from the Patent division of
GenBank. |
| Yeast |
Saccharomyces cerevisiae genomic
CDS translations |
| ecoli |
Escherichia coli genomic CDS
translations |
| pdb |
Sequences derived from the
3-dimensional structures from the Brookhaven Protein Data
Bank |
| Drosophila genome |
Drosophila genome proteins
provided by Celera and Berkeley Drosophila Genome
Project (BDGP). |
| month |
All new or revised GenBank CDS
translation+PDB+SwissProt+PIR+PRF released in the last
30 days |
|
| Nucleotide sequence database content |
 |
The content of the nucleotide sequence databases available
for BLAST searches is described below.
| Nucleotide Sequence
Databases |
| Database |
Content |
| nr |
All GenBank+EMBL+DDBJ+PDB sequences
(but no EST, STS, GSS, or phase 0, 1 or 2 HTGS
sequences). No longer "non-redundant". |
| est |
Database of GenBank+EMBL+DDBJ sequences
from EST
division. |
| est_human |
Human subset of GenBank+EMBL+DDBJ
sequences from EST division. |
| est_mouse |
Mouse subset of GenBank+EMBL+DDBJ
sequences from EST division. |
| est_others |
Non-Mouse, non-Human sequences of
GenBank+EMBL+DDBJ sequences from EST Division. |
| gss |
Genome
Survey Sequence, includes single-pass genomic data,
exon-trapped sequences, and Alu PCR sequences. |
| htgs |
Unfinished High Throughput
Genomic Sequences: phases 0, 1 and 2. Finished,
phase 3 HTG sequences are in nr. |
| pat |
Nucleotides from the Patent division of
GenBank |
| yeast |
Saccharomyces cerevisiae genomic
nucleotide sequences |
| mito |
Database of mitochondrial
sequences |
| vector |
Vector subset of GenBank(R), NCBI, in
ftp://ftp.ncbi.nlm.nih.gov/blast/db/ |
| ecoli |
Escherichia coli genomic
nucleotide sequences |
| pdb |
Sequences derived from the
3-dimensional structures from the Brookhaven Protein Data
Bank. |
| Drosophila genome |
Drosophila genome provided by
Celera and Berkeley Drosophila Genome
Project (BDGP) |
| month |
All new or revised
GenBank+EMBL+DDBJ+PDB sequences released in the last 30
days. |
| alu |
Select Alu repeats from REPBASE,
suitable for masking Alu repeats from query sequences.
It is available by FTP from ftp://ftp.ncbi.nlm.nih.gov/blast/db/alu.n.Z.
See "Alu alert" by Claverie and Makalowski, Nature 371:
752 (1994). |
| dbsts |
Database of GenBank+EMBL+DDBJ sequences
from STS division. . |
| chromosome |
Searches Complete Genomes, Complete
Chromosome, or contigs form the NCBI
Reference Sequence project. |
| wgs_anopheles |
Anopheles gambiae (mosquito)
whole genome shotgun sequences |
|