
Next: Commands
Up: 's URL API. User's
Previous: Overview
  Index
Searching the NCBI
consists of two major steps. The first step is called ''Put'', and it
puts the query sequence with the appropriate search parameters into the
system. The second step is
called ''Get'' and it formats the results with specified format parameters.
The following is an example how can this be done by using URLAPI.
Let's say we want to perform a nucleotide search of query with gi=555 against the 'nr' database. In addition we need to enable low-complexity filtering, set the expect value = 10, request HTML format of the output, use NCBI GI numbers in the output page, and show only the first 10 hits. In the URLAPI the first step will look like:
$ lynx -source \
"http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?QUERY=555\
&DATABASE=nr&HITLIST_SIZE=10&FILTER=L\
&EXPECT=10&FORMAT_TYPE=HTML&PROGRAM=blastn&CLIENT=web\
&SERVICE=plain&NCBI_GI=on&PAGE=Nucleotides\
&CMD=Put"
Please note that we use the ''lynx'' program in this example, and we have also split long lines in the shell as allowed in 'sh' syntax.
In the ''url-encoded'' format the '?' means the start of a list of parameters, which is followed
by a list of name-pairs separated with '&'. The names of parameters basically describe themselves, but
are also explained in more detail in the parameter section of this document. Note, that we set ''CMD=Put''
which means that we want to put this new search into
. It is also possible to specify a query
sequence in FASTA format like ''QUERY=acgtacgt''. For those who work with the NCBI toolkit and
are familiar with the Bioseq C structure, here is the code which converts NCBI Bioseq to FASTA format:
bsp = readdb_get_bioseq_ex(rdfp, oid, TRUE, TRUE);
if (!BioseqToFasta (bsp, fp, !is_prot)) {
ErrPostEx(SEV_ERROR, 0, 0, "Can't convert Bioseq to FASTA");
}
BioseqFree(bsp);
The output of the 'Put' command will be a valid HTML page, the contents of which may be ignored except the following important section:
<!--QBlastInfoBegin
RID = 954517067-8610-1647
RTOE = 207
QBlastInfoEnd
-->
This portion of the output is special and contains the Request Identifier (RID) and the estimated Request Time of Execution in seconds (RTOE) for the search. The RID is different for every search, and is a mandatory parameter for the next step which is formatting the BLAST results.
The simplest way to get results for a given RID using the default format parameters is to use the following URL:
$ lynx -source "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?\
RID=954517013-7639-11119&CMD=Get"
If the BLAST search is not yet complete, this will produce output in the following format with Status equal to "WAITING":
<!--QBlastInfoBegin
Status=WAITING
QBlastInfoEnd
--><p>
If the results are completed, the output will show the formatted results with status information like the following, with Status=READY:
<!--QBlastInfoBegin
Status=READY
QBlastInfoEnd
--><p>
... <formated output here>
If you search results has the Status of WAITING, it is advisable to wait for few seconds before trying to get results again.
Additional formatting options are also available, for example:
$ lynx -source "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?\
CMD=Get&RID=1234-566-897&FORMAT_OBJECT=Alignment\
&FORMAT_TYPE=HTML&DESCRIPTIONS=100&ALIGNMENTS=200\
&ALIGNMENT_TYPE=Pairwise&OVERVIEW=yes
This is interpreted as "get results for RID RID=1234-566-897, showing the alignments in pairwise form, produce HTML output, display at most 100 descriptions and 200 alignments, and show the graphical overview.
Fro more details please refer to the sections dealing with "Commands" and "Parameters". Individual descriptions for each command and parameter are available.