Many characters are unsafe if used within a URL. If these charaters need to be included in a BLAST URLAPI URL, we need to escape them into hexadecimal code or other safe counterparts to ensure the proper pasring and execution of the search or retrieval request. For reference, we list some of the commonly encountered unsafe characters and their accepted escaped counterparts below.
| Table 8.3 Common Unsafe Characters and Their Escaped Counterparts | |||||||||
| Unsafe character | white space | > | [ | ] | new line | | | % | @ | # |
|---|---|---|---|---|---|---|---|---|---|
| Safe Replacement | + | %3E | %5B | %5D | %0D%0A | %7C | %25 | %40 | %23 |
.
You can use a script to escape unsafe characters. The example in Perl below will accomplish the conversions:
while(<QUERY_FILE>){
s/>/%3E/g;
s/\s+/+/g;
s/\n/%0D%0A/g;
s/\|/%7C/g;
print;}
...
You can also use the uri_escape() function found in the URI::Escape Perl module to perform this task:
use URI::Escape;
...
$vars='human[organism]';
$vars=uri_escape($vars);
...
To use the customized deflines, we should use 'QUERY_BELIEVE_DEFLINE=true'. The following is a sample search with this setting using lynx. In the subsequent 'Get' step, it retrieves the result in asn.1 format.
$ echo "CMD=Put&QUERY_BELIEVE_DEFLINE=yes&QUERY=%3Eref|\
NM_000249%0D%0AGACGCCGCCGCCACCACCGCCACCGCCGC\
AGCAGAAGCAGCGCACCGCAGGAGGGAAGATGCCGGCGGGGCACGGGCTGCGGGC\
GCGGACGGCGACCTCTTCGCGCGGCCGTTCCGCAAGAAGGGTTACATCCCGCTCA\
CCACCTACCTGAGGACGTACAAGATCGGCGATTACGTNGACGTCAAGGTGAACGG\
TG&DATABASE=nr&PROGRAM=blastn" \
| lynx "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi" -post_data -mime_header \
| grep "RID ="
RID = AU6TGB0S016
...
$ echo "CMD=Get&RID=AU6TGB0S016&FORMAT_TYPE=ASN.1" | \
lynx "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi" -post_data -mime_header
...
Seq-annot ::= {
...
segs
denseg {
dim 2 ,
numseg 3 ,
ids {
local
str "ref|NM_000249" ,
...