GENSAT Project Data Now in Entrez
Influenza Virus Resource
New Microbial Genomes in GenBank
Iceman Preserved in GenBank
RefSeq Release 11
New Organisms in UniGene
GenBank Release 147
New Genome Build
PubMed Corrects Spelling
Each utility program accepts a number of command line arguments, specified using a dash and a single letter option code followed by an option value. Some values are boolean and are given as either ‘T’, true, or ‘F’, false. Others are specified using one-letter codes, such as format specifiers, or strings, such as file names or GenBank accession numbers. To see a complete list of command line parameters for any of the programs, run the program with a trailing dash and no parameter. A list of the eight programs with brief descriptions is given in Box 1, while a detailed description of one of the most versatile programs, “asn2all”, follows. In many situations, the multifunctional program asn2all can be run instead of asn2fsa, asn2gb or asn2xml.
The program “asn2all” is primarily intended to generate reports from the binary ASN.1 Bioseq-set GenBank release files that are available at:
Depending on the “f” argument, the program can produce GenBank and GenPept flatfiles, FASTA sequence files, INSDSet structured XML, TinySeq XML, and 5-column feature table formats. Prior to running asn2all, the GenBank release files, which have an “.aso.gz” suffix, should be uncompressed using a program such as “gunzip”, resulting in files with suffix “.aso”. For example, gbpri1.aso is the first file in the primate division, and the command:
will produce “gbpri1.aso”
Using asn2all, the name of the file to process is specified with the “-i” command line argument. Use “-a t” to indicate batch processing of a GenBank release file and “-b T” to indicate that it is binary ASN.1. A text ASN.1 record, such as one obtained on the web from Entrez, can be processed by using “-a a -b F” instead of “-a t -b T”.
Nucleotide and protein records within ASN.1 records can be processed simultaneously. Use the “-o” argument to indicate the nucleotide output file and the “-v” argument for the protein output file.
The “-f” argument determines the format to be generated. Legal values of “-f” and the resulting formats are:
asn2all -i gbpri1.aso -a t -b T -f g -o gbpri1.nuc -v gbpri1.prt
A remote fetching option, “-r T”, allows the download of an ASN.1 record from NCBI over a network connection using an accession number or NCBI gi number as an identifier. For instance, to download the feature table within the Reference Sequence record, or RefSeq, for the Escherichia coli genome via remote fetch, use:
asn2all -r T -A NC_000913 -f t
The output of this command for the first NC_000913 feature is given below. The 5-column feature table format used is identical to that required as input to generate an ASN.1 sequence file using tbl2asn, described in Box 1.
The eight ASN.1 utility programs may be downloaded at: