AsnLib: ASN.1 Processing


Introduction to ASN.1
AsnLib: Overview
Principles of Operation
Specification for AsnLib
AsnTool
AsnTool Tutorial
Using AsnLib
AsnLib: A Tutorial
Data-links
AsnLib Generated Header Files
Returns From AsnLib Parsing
Finding AsnTypePtrs at Run-time
Custom Read and Write Functions
Customizing an AsnIo Stream
ASN.1 Object Loaders
AsnLib and Object Loaders As a Generalized Iterator
AsnLib and Object Loaders Provide a Generalized Copy and Compare
AsnLib Interface: asn.h


 Introduction to ASN.1

Why ASN.1

Abstract Syntax Notation 1 (ASN.1) is used to describe the structure of data to be transferred between the Application Layer and the Presentation Layer of the Open Systems Interconnection (OSI).  It is meant to provide a mechanism whereby the Presentation Layer can use a single standard encoding to reliably exchange any arbitrary data structure with other computer systems, while the Application layer can map the standard encoding into any type of representation or language that is appropriate for the end user.  ASN.1 does not describe the content, meaning, or structure of the data, only the way in which it is specified and encoded.

These properties make it an excellent choice for a standard way of encoding scientific data.  Since ASN.1 does not specify content, specifications can be created as new concepts need to be represented.  Yet since it is an International Standards Organization (ISO) standard, the new specification can take advantage of various tools built to work with ASN.1 in general.  It removes from scientists the role of specifying ad hoc file formats, and focuses them instead on specifying the content and structure of data necessary to convey scientific meaning.

There are two aspects to ASN.1, the specification of the data and the encoded data itself.  The specification describes the abstract structure of the data and the allowed values various fields may take.  Frequently today scientific data is presented with no formal specification.  There may be some documentation describing the data file, but very often it is incomplete or not entirely accurate, since it is usually written about the file, rather than as an integral step toward building the file.  The ASN.1 specification is formal language, which means it can be automatically and thoroughly checked for errors and inconsistencies in form by machine before any data are collected at all.  Further, it can be used by a computer to validate that any data presented correctly reflect that specification.  This is essential in eliminating the random errors and oversights in generating data files that plague scientific data now.  A utility program, asntool, was built with the AsnTool libraries to do this sort of checking and validation while developing ASN.1 specifications.

The requirement for a separate specification also means that interested parties can examine and evaluate the structure of the data independent of any particular database or data file.  One can understand the limits and strengths of a specification separately from the quality or amount of the data itself. Data structures that prove to be useful can be re-used in a variety of ways; by large public databases, by small private databases, in various software tools, and in assorted data files.

Finally, a separate specification means software to construct, decode, and validate any ASN.1 specified object can be built semi- or fully automatically from the specification.  Data encoded according to that specification can then be processed with relatively little manual programming for those aspects of the application dealing directly with ASN.1.  This is what the AsnTool routines are for.

Structure of ASN.1

ASN.1 has Type References, identifiers, and values.  A Type Reference is the name of an object defined in an ASN.1 specification.  An identifier is a field within an object.  A value is generally not included in the specification, but rather is the value of a Type Reference or an identifier in data encoded in ASN.1.  Values can be encoded in either a text or a binary form.  The examples here will obviously be in the text form.

Type References ALWAYS start with an upper case letter.  Identifiers ALWAYS start with a lower case letter.  Values depend on what type of value it is (integer, string, etc.) and examples are given below.  "-" (hyphen) is the ONLY separator character allowed in References and identifiers.

ASN.1 allows elements of SET, SEQUENCE, and CHOICE to not have identifiers if they can be distinguished from each other by their type (e.g. one is an integer and one is a string).  However, this can make the text value notation ambiguous and it may also lead to errors in the hands of the novice.  So we REQUIRE that every element of a SET, SEQUENCE, and CHOICE have an identifier.

ASN.1 also allows the specification of numerical tags (used for the binary encoding) in [] in addition to or in lieu of identifiers.  Again, this can be a problem for the novice.  Since we require identifiers, our software generates the numerical tags itself and we can ignore this.  It still supports explicitly defined APPLICATION, and PRIVATE tags, but that is beyond the scope of this document.  Comments begin with   --   and end with   --    or end of line.

A simple ASN.1 specification module example is shown below:

Demo-module DEFINITIONS ::=       -- Module-name DEFINITIONS ::= BEGIN

BEGIN

 

EXPORTS My-type;                         -- My-type can be used by other modules

 

IMPORTS Foreign-type FROM Other-module; -- can import types

 

                                         -- we define an object called My-type

My-type ::= SEQUENCE {                   -- My-type is a Type Reference

   first     INTEGER ,                  -- first is an identifier

   second    INTEGER DEFAULT 2 ,        -- second defaults to 2

   third     VisibleString OPTIONAL     -- third is an optional string

   }                                     -- end of object definition

 

Another ::= Foreign-type                 -- can reference other defined types

 

END                                      -- end of module, END required

Value notation (or data encoded in the text form of ASN.1) looks like this:

My-type ::= {

   first 42

   }

This means this My-type will have first = 42, second = 2, and third not present.  To present more than one My-type you must have defined:

 

My-type-set ::= SET OF My-type           -- in Demo-module

 

  Then you could have:

My-type-set ::= {                        -- start SET OF

   {                                     -- a My-type

       first 42

    } ,

   {                                     -- another My-type

       first 27 ,

       second 22 ,

       third "Everything set here"

   }

}                                        -- end of SET OF

ASN.1 Primitive Types Supported by AsnLib

Type,

Description

Specification

Value Notation

BOOLEAN

Any TRUE or FALSE value

May have a DEFAULT

Truth ::= BOOLEAN

Truth ::= FALSE

INTEGER

Any integer value.

May be given named values but range not limited to names.

May have a DEFAULT.

Number ::= INTEGER

or

Number ::= INTEGER {

     red (1) ,

     blue (2) }

Number ::= 42

or

Number ::= red

OCTET STRING

Any string of bytes.

Returned as or read from ByteStorePtr.

May not have DEFAULT.

Hstring ::= OCTET STRING

Hstring ::= '0A01F'H

NULL

null is only allowed value

Nothing ::= NULL

Nothing ::= null

REAL

Floating point number in base 2 or 10.

REAL value notation is 3 integers for { matissa, base, exponent }

May have a DEFAULT.

Pi ::= REAL

Pi ::= { 314159, 10, -5 }

ENUMERATED

A named set of integer values.

Only named values allowed.

May have a DEFAULT

Sex ::= ENUMERATED {

     male (1) ,

     female (2) }

Sex ::= male

SEQUENCE

A series of other named types, in order.

Not related to a biological sequence.

All elements must be present unless OPTIONAL or DEFAULT

Yuppie ::= SEQUENCE {

     income   INTEGER ,

     name     VisibleString }

Yuppie ::= {

     income 100000 ,

     name "John Doe" }

SEQUENCE OF

A repeating series of a single type in order.

Stooges ::=

SEQUENCE OF VisibleString

Stooges ::= {

     "Larry" ,

     "Curly",

     "Moe" }

SET

A series of named other types.

Order does not matter.

All elements must be present unless OPTIONAL or DEFAULT

Yuppie ::= SET {

     income   INTEGER ,

     name     VisibleString }

Yuppie ::= {

     income 100000 ,

     name "John Doe" }

SET OF

A repeating series of a single type. Order does not matter.

Stooges ::=

SET OF VisibleString

Stooges ::= {

     "Larry" ,

     "Curly",

     "Moe" }

CHOICE

A way to select one from a set of alternate types.

NOTE:  in the value notation you are indicating one choice, so {} are not necessary (or allowed) but the identifier for the selected CHOICE must be given before the value.

Person ::= CHOICE {

     social-security INTEGER ,

     name VisibleString ,

     badge-id INTEGER }

Person ::= name "Joe"

VisibleString

A string of printable ASCII characters

NOTE: The double quite character (") may be included in a VisibleString by doubling it.
"He said ""Hi Mom!"" to her"
NOTE: AsnLib can accept wrapped long VisibleStrings.  That is, a string may contain internal newlines which are stripped on input from the value notation.
 Text ::= "He said ""Hi Mom!"" to her"
would be read as:
"He said ""Hi Mom!"" to her"

Text ::= VisibleString

Text ::= "Hi Mom!"

StringStore

ONLY in AsnLib. Defines a VisibleString which is read into a ByteStore instead of a CharPtr. Used for long strings like DNA sequences.

Dna ::= StringStore

Dna ::= "AGGAGG"

Further information about ASN.1

 

The Open Book
A Practical Perspective on OSI
by Marshall T. Rose
Prentice Hall, Englewood Cliffs, New Jersey  07632
(c) 1990

 

ISO Development Environment  (public software)
University of Pennsylvania
Dept. of Computer Science and Information Science
Moore School
Attn: David J. Farber (ISODE Distribution)
200 South 33rd Street
Philadelphia, PA  19104-6314
1-215-898-8560

 

OSIkit Tools from NIST  (1989) (public software)
US Dept. of Commerce
National Institute of Standards and Technology
Gaithersburg, MD

 

Information Processing - Open Systems Interconnection - Specification of Abstract Syntax Notation One (ASN.1).  International Organization for Standardization and International Electrotechnical Committee, 1987. International Standard 8824.

 

Information Processing - Open Systems Interconnection - Specification of Basic Encoding Rules for Abstract Syntax Notation One (ASN.1).  International Organization for Standardization and International Electrotechnical Committee, 1987.  International Standard 8825.

 

Information Processing - Open Systems Interconnection - Abstract Syntax Notation One (ASN.1) - Draft Addendum 1:  Extensions to ASN.1.  International Organization for Standardization and International Electrotechnical Committee, 1987.  Draft Addendum 8824/DAD 1.

 

Information Processing - Open Systems Interconnection - Abstract Syntax Notation One (ASN.1) - Draft Addendum 1:  Extensions to ASN.1 Basic Encoding Rules.  International Organization for Standardization and International Electrotechnical Committee, 1987.  Draft Addendum 8825/DAD 1.

AsnLib: Overview

AsnLib is a library of functions developed by NCBI for manipulating and exchanging ASN.1 specifications and encoded data for scientific purposes.

A number of commercial and public domain tools are available for working with ASN.1 and for automatically building data handlers of various sorts. They are focused on the use for which ASN.1 was originally intended, the exchange of data between layers of the OSI.  As such they tend to automate the process more than AsnLib does, because the domain of use is much more limited.  The fact that they determine the internal data structures to use and write all the code to handle them themselves is not a big problem in this case.

When ASN.1 is used for scientific data description though, other uses will be made of the encoded data than may have originally been envisaged by the designers of these products.  For example, a scientist will often want an application which scans through a large complicated data structure, and just extracts certain fields for use, or even just counts occurrences of certain values.  A tool which automatically generates large elaborate data structures and lots of code to parse the stream, generate the structures, and store them in memory is inappropriate for such an application.  Further, a scientific application may well wish to manipulate that data in a different language than the tool is written in, such as FORTRAN, PROLOG, or LISP.  These applications may well wish to store the whole data structure from the stream, but they will not wish to use the data structures provided by the tool.

ASN.1 can be used to encode data in two ways, an ASCII human readable form called "value notation" or "print form", and a binary encoding.  ASN.1 has separate standards documents for the syntax (specification rules) and the binary encoding rules (BER, or "Basic Encoding Rules").  This was done on purpose to allow various encoding rules for the same abstract syntax.  The BER is, at this writing, the only official ISO encoding for ASN.1, but several other encodings which are faster or take less space, are under consideration by ISO.  Currently the only binary encoding AsnLib supports is BER.

The value notation or ASCII form of the data is not really an official ISO standard.  It was meant to provide a human readable form of ASN.1 data for development or explication, but not as a standard for data exchange. Nonetheless, value notation rules are given in the ISO documents for all the data types they describe.  With only a few additional rules, value notation is quite robust for data exchange.  These rules are listed in Appendix 1. While we do not recommend the ASCII form of ASN.1 encoded data for large amounts of data, it is very useful for developing and testing data representations or for generating ASN.1 values easily from other data files or local databases without specialized tools.  Since the value notation and binary encoded forms of data are completely and reliably interconvertable using AsnLib, there is no problem doing this.

Principles of Operation

AsnLib operates on atomic elements of ASN.1 specified data.  It is built using the NCBI core software tools and this document assumes you have some acquaintance with them.  AsnLib reads or writes strings, integers, etc. with single function calls.  Composite objects such as a SEQUENCE or a SET are read or written with a series of calls to read or write its component parts.  The process is designed to be relatively intuitive even in this case.  One calls a function to start encoding a SEQUENCE, then calls the routines to encode its parts, then calls a function to end encoding the SEQUENCE. NCBI has built functions to read and write such higher level objects in single function calls (described in the chapters on data), which use the low level AsnLib functions described here.

One can read and write any type using only three functions.  They take as arguments the identifier of an ASN.1 encoded stream (binary or ASCII), a pointer to a node in a parse tree (generated from the ASN.1 specification), and a pointer to a union which can hold a value of any type.  All aspects of how to encode a value properly, error checking to be sure that all appropriate nodes in the tree are visited in the proper order and that values are valid for a particular type are all taken care of within AsnLib and are not the concern of application programmer.  The application programmers must read and understand the ASN.1 specification to make proper use of it, but all the other details of using ASN.1 correctly are not their concern.

The parse tree contains information about the type of every node, its name, its binary tag, allowed values, default values, and the next valid element. The header file also contains a series of #defines which associate names derived from the ASN.1 specification with pointers to nodes in the parse tree. Thus one's code would refer to JOURNAL_title, not a pointer to a specific node.  Using these defines means that if an ASN.1 specification is changed, but the names and types of nodes an application cares about have not changed, the application can be updated by just compiling with the new header file.

There are also functions which allow more interpreter-like code to be written.  One function will load an ASN.1 specification from a file, validate it, and build the appropriate parse tree on the fly, rather than at compile time by including a header file.  One can still identify nodes in the tree by name with a function that searches the tree for nodes with names matching a string.  As with all interpreter/compiler trade offs, such an application is slower, but more flexible.

AsnLib assumes that specifications will be written as a collection of smaller modules.  Data types may be declared as IMPORTS or EXPORTS by any module.  Multiple modules which reference each other may be loaded at once into AsnLib or through the interpreter function described above.  It will then link the modules before outputing the header file, thus effectively building a single parse tree containing all the modules.

In another approach, one might build a series of functions which handle the datatypes in a particular module.  Then when one writes code which uses a module which IMPORTS another module type, it is left unlinked in that parse tree and one just calls the appropriate function to read it.  AsnLib contains two functions for temporarily linking, then unlinking local parse subtrees to a parent object parse tree for this purpose.  We have begun to build a library of such modular object functions, so one need not link the whole world of possible datatypes into a single routine or module, or write the basic routines to create, destroy, and exchange such sub-objects.

Specification for AsnLib

AsnLib supports the following types from ISO 8824 and the ASN.1 enhancements.  The internal representation used by AsnLib (from the NCBI core tools) for routines dealing with these types is also shown.

Supported ASN.1 primitive types

type                                

internal representation                

BOOLEAN

Boolean          

INTEGER

Int4

OCTET STRING

ByteStorePtr

NULL

no value

REAL

FloatHi

ENUMERATED

Int4

SEQUENCE

no value

SEQUENCE OF

no value

SET

no value

SET OF

no value

CHOICE

no value

VisibleString

CharPtr

StringStore

ByteStorePtr

Other ASN.1 string types are supported as VisibleString.  No checks are made to ensure restrictions of character usage by the various string types. Types not supported by AsnLib at this point (although they will be accepted in a module specification as valid ASN.1) are:

Unsupported ASN.1 primitive types

BIT STRING

OBJECT IDENTIFIER

ObjectDescriptor

EXTERNAL

ANY

GeneralizedTime

UTCTime

The following keywords are currently supported by AsnLib:

Supported ASN.1 keywords

DEFINITIONS

BEGIN

END

EXPORTS

IMPORTS

FROM

APPLICATION

PRIVATE

UNIVERSAL

DEFAULT

OPTIONAL

FALSE

TRUE

The following ASN.1 keywords are not supported by AsnLib (although they are passed in a module specification as valid ASN.1):

Unsupported ASN.1 keywords

IMPLICIT

ABSENT

BY

COMPONENT

DEFINED

INCLUDES

MIN

MINUS-INFINITY

MAX

PRESENT

PLUS-INFINITY

SIZE

TAGS

WITH

AsnLib uses indefinite encoding for output of all binary encoded non‑ primitive types.  It can decode either definite or indefinite binary encoded data for all types.  This conforms to the BER.

DEFAULT values may be given in an ASN.1 specification.  AsnLib accepts and records them in the parse tree.  However, it does not supply the value if it is missing from the input stream on the assumption that the application would want to distinguish a value actually supplied from a value defaulted locally. DEFAULT is only supported for simple types like INTEGER or VisibleString, but not for structure types like SEQUENCE because it is too difficult to code.

Values may not be assigned in a specification module to types defined in a different module.  Each module is self contained and does not "know" anything about types defined in other modules except their names if they were IMPORTS. So suppose one module defines:

 

Dna-strand ::= ENUMERATED { plus(1), minus(2) }

 

A different module may not use the DEFAULT in the following case:

 Dna-sequence ::= SEQUENCE {

   length INTEGER ,

   strand Dna-strand DEFAULT plus }

 

because it does not know Dna-strand is ENUMERATED or what its allowed values are.  Such a construct is acceptable if the definition of Dna-strand and Dna‑ sequence are in the same module and the Dna-strand definition comes first.

Elements of a SEQUENCE are checked that they are all received or sent in the correct order and that no non-OPTIONAL or non-DEFAULT elements are missing.  However, because AsnLib does not store whole structures, it can only check that the types of elements in a SET are correct, but cannot check if more than one element of a type is used or if a required element is missing.  For this reason it is safer to use SEQUENCE rather than SET as a rule when using AsnLib.  While there is a semantic difference, there is no representational limitation in doing this.

AsnTool

An application program called "asntool", is built by the NCBI Software Toolkit using the AsnLib function libraries, which in turn are based on the NCBI portable core software tools. This application is a utility program which can:

1.             Read, write, and error check an ASN.1 specification.

2.             Read, write, and check ASCII values conforming to the specification in 1.

3.             Read, write, and check binary values conforming to the specification in 1.

4.             Combinations of 2 and 3 to translate or convert between binary and ASCII

5.             Output a C language header file which contains a parse tree for specification 1 which can be used in an application program.

AsnTool Tutorial

It may be quickest to demonstrate the use of AsnLib through example.  In the distribution directory of the NCBI Software Toolkit, \ncbi, there are two subdirectories. \demo contains demonstration source code to be used in the section below and 2 samples of MEDLINE entries as ASN.1 value notation (ASCII).  medline.ent is a single Medline-entry and medline.prt is a Pub-set containing many MEDLINE entries. \asn contains the ASN.1 specifications for the modules used to describe the MEDLINE entries.  They are:

File           

Module               

Description                           

general.asn

NCBI-General

general purpose data types

pub.asn

NCBI-Pub

branch point for various publication types.

biblio.asn

NCBI-Biblio

standard bibliographic citations for journals, books, manuscripts, patents based on ANSI standard

medline.asn

NCBI-Medline

MEDLINE entry (based on NCBI-Biblio)

asnpub.all

all

all above modules in one file

asntool should have been built as part of installing the system.  It is in \ncbi\bin.  Set your path, or move asntool to a place it can be executed.

From within the \demo directory, run asntool with no arguments.  It presents its argument usage to you.  Note that you must always give a module file name. asntool takes only one module file, so if you wish to use more than one you must concatenate them into a single file, such as asnpub.all.

Try the following exercises -- type:

 

asntool -m ..\asn\asnpub.all

 

This will read the publication modules and validate that they are correctly built.  asntool will notify you of various syntax errors and typos, usually giving the line number where the error occurred.  It makes sure that everything EXPORTS from a module is defined in that module and that everything IMPORTS is used by that module.  Everything not IMPORTS must be defined within the module.  In the case of multiple modules, it will try to link EXPORTS from one module with IMPORTS from others.  It is not an error to be unable to link an IMPORTS, but it does imply you expect it to be handled by an outside function.  There are no errors in asnpub.all, so asntool is silent.  The path may have a different form on various machines.

 

asntool -m ..\asn\asnpub.all -v medline.ent

 

This does everything above, and then reads the file medline.ent which it expects to be of a type defined in asnpub.all.  It checks for errors, reporting any it finds.  There are none, so asntool is silent.

 

asntool -m ..\asn\asnpub.all -v medline.ent -p stdout

 

On command line systems, everything above will happen, except that medline.ent will be encoded from asntool's internal structures to ASN.1 value notation on stdout, your terminal.  On Macintosh or Microsoft Windows, the output will go to a disk file named "stdout".

 

asntool -m ..\asn\asnpub.all -v medline.prt -e medline.val

 

This reads the set of MEDLINE records from medline.prt and encodes them in binary ASN.1 in the file medline.val

 

asntool -m ..\asn\asnpub.all -d medline.val -t Pub-set -p stdout

 

This reads (decodes) the set of MEDLINE records from the binary ASN.1 file we just made and outputs them as value notation on stdout.  Note that we MUST specify the type (Pub-set) of the binary file or message.  That is because the binary form does not have that information.  The value notation form does, so asntool can figure it out, but the binary, which is the real ISO standard, does not.

 

asntool -m ..\asn\asnpub.all -o allpub.h

 

This outputs a header file for an application which will use the asntool routines to encode and decode objects defined in asnpub.all.

Using AsnLib

If you take a look at the allpub.h you generated above, you will see that it includes <asn.h> which defines the interface to the AsnLib library and which includes <ncbi.h> which defines the interface to the NCBI core software tools.

Then the arrays of structures defining the parse tree come.  You should never program directly for these structures as they may change without notice. You should always use the functions described below.

Last come the #defines for pointers to specific nodes in the parse tree. They are built from the names of objects specified in the ASN.1 modules.  The name of the type itself is upper case, and component parts are in lower case. An example of the mapping between the ASN.1 specification medline.asn and the #defines in allpub.h is shown in Appendix 2. 

One less intuitive aspect of this system applies only to SET OF or SEQUENCE OF which are repeating series of the same type.  Since any one element of such a repeating series does not have a name, one must be invented.  This is done by appending a _E (for Element of) to the parent name (e.g.. if Name-list ::= SEQUENCE OF VisibleString, then one name (VisibleString) of that SEQUENCE OF would have a #defined node name of NAME_LIST_E).  Names defined this way are limited to a maximum of 31 characters.  If they grow longer than that, the leftmost characters are truncated.  The suggestion is: keep names as short as you can and still be meaningful.  Also, since "-" is the only valid separator character in ASN.1 but "_" is the only valid separator character in C, the Name-list (mentioned above) node in the parse tree would be defined as NAME_LIST.

ASN.1 encoded values are represented basically as identifier/value pairs. AsnLib has two parsing functions that correspond to the members of the pair:

atp = AsnReadId(aip, amp, atp);

    Reads an identifier from an input stream (aip) and returns a pointer to the appropriate node in the parse tree for it (atp as the return value).  atp will be one of the nodes #defined in the header generated by AsnLib.

 

success = AsnReadVal(aip, atp, avp);

      Reads the value of atp from the stream (aip) into an AsnValue (a union of Pointer, Int4, Boolean, FloatHi).  If AsnReadVal() is called with avp = NULL, it skips over that value.  This is useful for scanning through a file extracting only a few fields.

To parse then, one basically just alternates AsnReadId() and AsnReadVal(). The most common error to make in writing a parser that uses these functions is to get out of synchronization alternating between these two routines.

There is only one function to write an identifier/value pair at once:

success = AsnWrite(aip, atp, avp);

      Writes the identifier pointed to by atp, and the value in avp, to the stream aip.

AsnLib: A Tutorial

In \ncbi\demo are three small demo applications that process medline entries and require the allpub.h header and the binary form of medline.prt we built in the sections above.  The make files for Microsoft C (makedemo.msc) and for all UNIX systems (makedemo.unx) are in \make.  Copy the makedemo file appropriate for your system into \ncbi\build and make it.

getmesh.c

Function:  Reads a Medline-entry, extracts the MeSH terms, and prints them.

Type "getmesh -" to see its arguments.

Type "getmesh -i medline.ent -o terms.out".  getmesh reads medline.ent, which contains a single Medline-entry in value notation (ASCII).  This file is presented at the end of this chapter, somewhat abbreviated, with the #defined names for the nodes in the allpub.h parse tree that will be encountered in the course of reading this file. getmesh parses it, extracts the MeSH terms and prints them in "terms.out".

Look at the source code in getmesh.c.

/*****************************************************************************

*

*   getmesh.c

*      gets mesh terms from a Medline-entry

*

*****************************************************************************/

#include <allpub.h>

 

#define NUMARGS 3

Args myargs[NUMARGS] = {

   { "Input data", NULL, "Medline-entry", NULL, FALSE, 'i', ARG_DATA_IN, 0.0,0,NULL},

   { "Input data is binary", "F", NULL, NULL, TRUE , 'b', ARG_BOOLEAN, 0.0,0,NULL},

   { "Output list", NULL, NULL, NULL, FALSE, 'o', ARG_FILE_OUT, 0.0,0,NULL}};

 

Int2 Main()

{

   AsnIoPtr aip;

   AsnTypePtr atp;

   DataVal value;

   static CharPtr intypes[2] = { "r", "rb" };

   Int2 intype;

   FILE *fp;

 

    if (! AsnLoad())

        Message(MSG_FATAL, "Unable to load allpub parse tree.");

 

    if (! GetArgs("GetMesh 1.0", NUMARGS, myargs))

       return 1;

 

   if (myargs[1].intvalue)        /* binary input is TRUE */

       intype = 1;

   else

       intype = 0;

 

   if ((aip = AsnIoOpen(myargs[0].strvalue, intypes[intype])) == NULL)

   {

       Message(MSG_ERROR, "Couldn't open %s", myargs[0].strvalue);

       return 1;

   }

 

   if ((fp = FileOpen(myargs[2].strvalue, "w")) == NULL)

   {

       Message(MSG_ERROR, "Couldn't open %s", myargs[2].strvalue);

       return 1;

   }

 

   atp = MEDLINE_ENTRY;

 

   fprintf(fp, "MeSH terms =\n\n");

   while ((atp = AsnReadId(aip, amp, atp)) != NULL)

   {

       if (atp == MEDLINE_MESH_term)

       {

          AsnReadVal(aip, atp, &value);

          FilePuts(value.ptrvalue, fp);

          FilePuts("\n", fp);

          AsnKillValue(atp, &value);

       }

       else

          AsnReadVal(aip, atp, NULL);

   }

 

   aip = AsnIoClose(aip);

 

   FileClose(fp);

 

   return 0;

}

Pretty short for doing all this, isn't it?  Walking through the code:

0.             AsnLoad() is called to load the ASN.1 parse tree for "allpub" into memory.

1.             GetArgs() is called to display or get the command line arguments.

2.             The appropriate string is selected for opening a value notation ("r") or a binary ("rb") input stream.

3.             The input stream is opened with AsnIoOpen().

4.             The file for printed output is opened.

5.             atp is initialized to MEDLINE_ENTRY, the defined node we expect the input stream to start with.  If the input stream were ALWAYS value notation, atp could be set to NULL, and Medline-entry ::= would be read from the input file and atp set correctly.  Since getmesh takes binary and value notation, atp must be properly initialized.

6.             The main while loop just reads identifiers with AsnReadId() until it returns NULL, which is EOF.  The argument, amp, is the AsnModulePtr declared in allpub.h.  It is used to locate the appropriate AsnTypePtr (atp) if it was set to NULL on the first call.  After that, atp provides the link to the parse tree.

7.             In the while loop, a check is made to see if atp == MEDLINE_MESH_term, or the VisibleString containing a single MeSH term.  If so, we read the value with AsnReadVal(), print it, then call AsnKillValue() which will deallocate any storage used when any data type is read.  Since a VisibleString requires storage this is necessary.  There is no harm in calling AsnKillValue() even on types that do not allocate storage (e.g.. INTEGER).

8.             If it's not a MeSH term, we call AsnReadVal() with a NULL argument for the AsnValuePtr, which just skips over the value to the next identifier.

9.             We close the streams.

10.           c'est tout.

indexpub.c

Function: Builds an index to medline.ent base on Medline Unique Identifier.

Type "indexpub -" to see the arguments.

Type "indexpub -imedline.val".  indexpub will read the binary value file, medline.val, note the seek offset of the start of each Medline-entry it contains, identifies the Medline uid for it, and builds an index file, "medline.idx".

Take a look at the source code, indexpub.c.

/*****************************************************************************

*

*   indexpub.c

*      indexes a Pub-set by Medline UID

*

*****************************************************************************/

#include <allpub.h>

 

#define NUMARGS 3

Args myargs[NUMARGS] = {

   { "Input data", "medline.val", "Pub-set", NULL, FALSE, 'i', ARG_DATA_IN, 0.0,0,NULL},

   { "Input data is binary", "T", NULL, NULL, TRUE , 'b', ARG_BOOLEAN, 0.0,0,NULL},

   { "Output index table", "medline.idx", NULL, NULL, FALSE, 't', ARG_FILE_OUT, 0.0,0,NULL}};

 

 

Int2 Main()

{

   AsnIoPtr aip;

   AsnTypePtr atp;

   DataVal value;

   Int4 seekptr, tempseek, uid;

   static CharPtr intypes[2] = { "r", "rb" };

   Int2 intype;

   FILE *fp;

 

    if (! AsnLoad())

        Message(MSG_FATAL, "Unable to load allpub parse tree.");

 

   if (! GetArgs("IndexPub 1.0", NUMARGS, myargs))

       return 1;

 

   if (myargs[1].intvalue)        /* binary input is TRUE */

       intype = 1;

   else

       intype = 0;

 

   if ((aip = AsnIoOpen(myargs[0].strvalue, intypes[intype])) == NULL)

   {

       Message(MSG_ERROR, "Couldn't open %s", myargs[0].strvalue);

       return 1;

   }

 

   if ((fp = FileOpen(myargs[2].strvalue, "w")) == NULL)

   {

       Message(MSG_ERROR, "Couldn't open %s", myargs[2].strvalue);

       return 1;

   }

 

   atp = PUB_SET;

   tempseek = 0L;

 

   while ((atp = AsnReadId(aip, amp, atp)) != NULL)

   {

       if (atp == PUB_SET_medline_E)

          seekptr = tempseek;

       if (atp == MEDLINE_ENTRY_uid)

       {

          AsnReadVal(aip, atp, &value);

          uid = value.intvalue;

          fprintf(fp, "%ld %ld\n", uid, seekptr);

       }

       else

          AsnReadVal(aip, atp, NULL);

       tempseek = AsnIoTell(aip);

   }

 

   aip = AsnIoClose(aip);

   FileClose(fp);

   return 0;

 

}

It is the same basic structure as getmesh.c.  However, the use of the while loop is a little different.  Since we are building an index, we want to record the offset in the file of the identifier which starts each medline entry in the Pub-set (PUB_SET_medline_E ‑- a PUB_SET of type medline is a SET OF Medline-entry).  So tempseek is set (to 0 to begin with, then with AsnIoTell()) BEFORE each read of an identifier with AsnReadId().  When the return value is PUB_SET_medline_E we know that tempseek contains the seek offset just before the first identifier for the Medline-entry.  Then we read through the entry looking for the MEDLINE_ENTRY_uid since we want to index on the MEDLINE Unique Identifier. When we find it, we store the seek offset and the uid in the index file.  All other values are skipped.

getpub.c

Function: Uses the index created by indexpub.c to retrieve a Medline-entry from medline.val by Medline uid.

/*****************************************************************************

*

*   getpub.c

*      does an indexed lookup for medline entries by medline uid

*

*****************************************************************************/

#include "allpub.h"

 

#define NUMARGS 5

Args myargs[NUMARGS] = {

   { "Input binary data", "medline.val", "Pub-set", NULL, FALSE, 'i', ARG_DATA_IN, 0.0,0,NULL},

   { "Medline UID to find", "88055872", NULL,NULL,FALSE,'u', ARG_INT, 0.0, 0, NULL },

   { "Input index table", "medline.idx", NULL,NULL,FALSE,'t', ARG_FILE_IN, 0.0,0,NULL },

   { "Output data", "stdout", "Medline-entry",NULL,FALSE,'o',ARG_DATA_OUT, 0.0,0,NULL},

   { "Output data is binary", "F", NULL, NULL, FALSE , 'b', ARG_BOOLEAN, 0.0,0,NULL}};

 

 

Int2 Main()

{

   AsnIoPtr aip, aipout;

   AsnTypePtr atp;

   DataVal value;

   Int4 seekptr, uid, uid_to_find;

   static CharPtr outtypes[2] = { "w", "wb" };

   Int2 outtype;

   FILE *fp;

   Boolean done, first;

   int retval;

 

    if (! AsnLoad())

        Message(MSG_FATAL, "Unable to load allpub parse tree.");

 

   if (! GetArgs("GetPub 1.0", NUMARGS, myargs))

       return 1;

 

   if (myargs[4].intvalue)        /* binary output is TRUE */

       outtype = 1;

   else

       outtype = 0;

 

   if ((aip = AsnIoOpen(myargs[0].strvalue, "rb")) == NULL)

   {

       Message(MSG_ERROR, "Couldn't open %s", myargs[0].strvalue);

       return 1;

   }

 

   if ((aipout = AsnIoOpen(myargs[3].strvalue, outtypes[outtype])) == NULL)

   {

       Message(MSG_ERROR, "Couldn't open %s", myargs[3].strvalue);

       return 1;

   }

 

   if ((fp = FileOpen(myargs[2].strvalue, "r")) == NULL)

   {

       Message(MSG_ERROR, "Couldn't open %s", myargs[2].strvalue);

       return 1;

   }

 

   uid_to_find = myargs[1].intvalue;

   done = FALSE;

   first = TRUE;

   while (! done)

   {

       retval = fscanf(fp, "%ld %ld", &uid, &seekptr);

       if (retval == EOF)

       {

          Message(MSG_ERROR, "UID %ld not found", uid_to_find);

          return 1;

       }

       if (uid == uid_to_find)

          done = TRUE;

   }

   FileClose(fp);

 

   atp = MEDLINE_ENTRY;

   AsnIoSeek(aip, seekptr);

   done = FALSE;

   while (! done)

   {

       atp = AsnReadId(aip, amp, atp);

       AsnReadVal(aip, atp, &value);

       AsnWrite(aipout, atp, &value);

       AsnKillValue(atp, &value);

 

       if (! first)

       {

          if (atp == MEDLINE_ENTRY)

              done = TRUE;

       }

       else

          first = FALSE;

   }

 

   AsnIoClose(aip);

   AsnIoClose(aipout);

 

   return 0;

}

This is a very simple program.  It looks up the seek offset into the file by uid, and seeks to that point with AsnIoSeek().  It then just cycles through the process of reading an identifier then reading a value from medline.val using AsnReadId() and AsnReadVal().  It then writes them both to the output file with AsnWrite().  Any storage used is freed with AsnKillValue(). Depending on the way the output AsnIo stream is opened, ASCII or binary, the program can deliver a binary Medline-entry or an ASCII conversion of it.

One important point to note is that the way the while loop knows when it has finished reading a MEDLINE_ENTRY.  Since it is a SEQUENCE which is basically a structure with component parts, AsnReadId() returns atp == MEDLINE_ENTRY twice.  Once when it reads the start of the structure, and once when it reads the end.  If you imagine the MEDLINE_ENTRY being bounded by braces {} as in the value notation the process is this:

MEDLINE_ENTRY ::= { AsnReadId() gets MEDLINE_ENTRY, AsnReadVal() gets {

    one ,                         { read the internal components )

    two

   }                AsnReadId() gets MEDLINE_ENTRY, AsnReadVal() gets }

To produce the same effect on output, there are two extra output functions for AsnLib, in addition to AsnWrite().

AsnOpenStruct(aip, atp, ptr)

                Writes the first instance of atp on the output stream aip at the beginning of a structure (SEQUENCE, SET, SEQUENCE OF, SET OF).

 

AsnCloseStruct(aip, atp, ptr)

                Writes the second, closing instance.

The "ptr" argument is a pointer to the internal C structure representing the ASN.1 structure. It is used by functions that piggyback on the AsnWrite functions to explore the internal objects (discussed below).

For this reason a similar function is provided to write a CHOICE.

AsnWriteChoice(aip, atp, choice, value)

                Writes a choice of types. The choice argument is an integer to indicate which type will be written at the next AsnWrite(), and value is a DataVal in which can be passed the internal C structure used to represent the choice.

 In the case of getpub.c, it is not necessary to call these functions because getpub is simply reading the data from an ASN.1 stream then writing it again in order, which includes the two instances of MEDLINE_ENTRY.

Another point about this program is that we recognized the Medline entries in the Pub-set in indexpub.c by looking for PUB_SET_medline_E, but we are reading and writing the same entry in getpub.c using MEDLINE_ENTRY.  That is because a Pub-set of CHOICE medline is defined as a SET OF Medline-entry.  So when reading the whole Pub-set, each Medline-entry is a PUB_SET_medline_E. But when reading one entry it is a MEDLINE_ENTRY.

Data-links

Data-links are described in the NCBI Core Tools document.  They are meant to be "ports" in and out of software applications which perform exchange of structured data (in ASN.1).  The inputs and outputs for getpub.c and getmesh.c are actually Data-links.  If you simply type the command:

 

getpub -u 88055872 -b -o stdout | getmesh -i stdin -b -o terms.out

 

you have executed a pair of programs which communicate over a Data-link with structured, binary encoded ASN.1.  getpub extracts a Medline-entry with uid = 88055872 from a binary encoded Pub-set by indexed look-up, transfers it out stdout as a Medline-entry in binary, to getmesh which parses the "message" and locates MeSH terms, and prints them to test.out.

This example is just a pipe between two programs, with the enhancement that the stream is binary coded ASN.1, which permits a very much richer "vocabulary" for the exchange than is usual for traditional pipes.  Further, since binary coded ASN.1 is a machine independent coding, the exchange could just as easily been between two completely different machines over a network. Finally, this pipe is a single channel of exchange.  The principles hold if one expands the system to many channels, by a variety of means.

AsnLib Generated Header Files

Correspondence between ASN.1 and header #defines

Medline-entry ::= SEQUENCE {                    MEDLINE_ENTRY

   uid INTEGER ,                                      MEDLINE_ENTRY_uid

   em Date ,                                          MEDLINE_ENTRY_em

   cit Cit-art ,                                      MEDLINE_ENTRY_cit

   abstract VisibleString OPTIONAL ,                  MEDLINE_ENTRY_abstract

   mesh SET OF Medline-mesh OPTIONAL ,                MEDLINE_ENTRY_mesh

   substance SET OF Medline-rn OPTIONAL ,             MEDLINE_ENTRY_substance

   xref SET OF Medline-si OPTIONAL ,                  MEDLINE_ENTRY_xref

   idnum SET OF VisibleString OPTIONAL }        MEDLINE_ENTRY_idnum

 

Medline-mesh ::= SEQUENCE {              MEDLINE_MESH

   mp BOOLEAN DEFAULT FALSE ,                         MEDLINE_MESH_mp

   term VisibleString ,                               MEDLINE_MESH_term

   qual SET OF Medline-qual OPTIONAL }                MEDLINE_MESH_qual

Returns From AsnLib Parsing

Medline-entry with header #defines as returned when parsing with AsnLib

Medline-entry ::= {                    /MEDLINE_ENTRY

  uid 88055872 ,                      |   MEDLINE_ENTRY_uid

  em                                  |   MEDLINE_ENTRY_em

    std {                             |    /DATE_std

      year 1988 ,                     |   |   DATE_STD_year

      month 3                         |   |   DATE_STD_month

    } ,                               |    \DATE_std

  cit {                               |  /MEDLINE_ENTRY_cit

    title {                           | |  /CIT_ART_title

      name "Developmental .. protein."| | |   TITLE_name

    } ,                                | |  \CIT_ART_title

    authors {                         | |  /CIT_ART_authors

      names                           | | |  AUTH_LIST_names

        ml {                          | | |   /AUTH_LIST_names_ml

          "Giebel LB" ,               | | |  |   AUTH_LIST_names_ml_E

          "Dworniczak BP" ,           | | |  |   AUTH_LIST_names_ml_E

          "Bautz EK"                  | | |  |   AUTH_LIST_names_ml_E

        } ,                           | | |   \AUTH_LIST_names_ml

      affil                           | | |    AUTH_LIST_affil

        str "Zentrum ... Germany"     | | |      AFFIL_str

    } ,                               | |  \CIT_ART_authors

    from                              | |   CIT_ART_from

      journal {                       | |    /CIT_ART_from_journal

        title {                       | |   |  /CIT_JOUR_title

          ml-jta "Dev Biol"           | |   | |   TITLE_ml_jta

        } ,                           | |   |  \CIT_JOUR_title

        imp {                         | |   |  /CIT_JOUR_imp

          date                        | |   | |   IMPRINT_date

            std {                     | |   | |    /DATE_std

              year 1988 ,             | |   | |   |   DATE_STD_year

              month 1                 | |   | |   |   DATE_STD_month

            } ,                       | |   | |    \DATE_std

          volume "125" ,              | |   | |   IMPRINT_volume

          issue "1" ,                 | |   | |   IMPRINT_issue

          pages "200-7"               | |   | |   IMPRINT_pages

        }                             | |   |  \CIT_JOUR_imp

      }                               | |    \CIT_ART_from_journal

  },                                  |  \MEDLINE_ENTRY_cit

  abstract "Multiple ... protein." ,  |   MEDLINE_ENTRY_abstract

  mesh {                              |  /MEDLINE_ENTRY_mesh

    {                                 | |  /MEDLINE_ENTRY_mesh_E

      term "Amino Acid Sequence"      | | |   MEDLINE_MESH_term

    } ,                               | |  \MEDLINE_ENTRY_mesh_E

    {                                  | |  /MEDLINE_ENTRY_mesh_E

      term "Clathrin" ,               | | |   MEDLINE_MESH_term

      qual {                          | | |  /MEDLINE_MESH_qual

        {                             | | | |  /MEDLINE_QUAL

          subh "metabolism"           | | | | |   MEDLINE_QUAL_subh

        }                             | | | |  \MEDLINE_QUAL

      }                               | | |  \MEDLINE_MESH_qual

    } ,                               | |  \MEDLINE_ENTRY_mesh_E

    {                                 | |  /MEDLINE_ENTRY_mesh_E

      term "Heat-Shock Proteins" ,    | | |   MEDLINE_MESH_term

      qual {                          | | |  /MEDLINE_MESH_qual

        {                             | | | |  /MEDLINE_QUAL

          mp TRUE ,                   | | | | |   MEDLINE_QUAL_mp

          subh "genetics"             | | | | |   MEDLINE_QUAL_subh

        }                             | | | |  \MEDLINE_QUAL

      }                               | | |  \MEDLINE_MESH_qual

    }                                 | |  \MEDLINE_ENTRY_mesh_E

  } ,                                 |  \MEDLINE_ENTRY_mesh

  substance {                         |  /MEDLINE_ENTRY_substance

    {                                 | |  /MEDLINE_substance_E

      type cas ,                      | | |   MEDLINE_RN_type

      cit "9007-49-2" ,               | | |   MEDLINE_RN_cit

      name "DNA"                      | | |   MEDLINE_RN_name

    }                                 | |  \MEDLINE_substance_E

  } ,                                 |  \MEDLINE_ENTRY_substance

  xref {                              |  /MEDLINE_ENTRY_xref

    {                                 | |  /MEDLINE_ENTRY_xref_E

      type genbank ,                  | | |   MEDLINE_SI_type

      cit "M19141"                    | | |   MEDLINE_SI_cit

    }                                 | |  \MEDLINE_ENTRY_xref_E

  }                                   |  \MEDLINE_ENTRY_xref

}                                      \MEDLINE_ENTRY

Finding AsnTypePtrs at Run-time

The #defines described above are statically defined in a header file. But sometimes one must find the parse tree nodes (asntypes) from a module which does not include the parse tree itself. If all parse trees have been loaded using the AsnLoad() functions in the modules that include the parse trees, then they are globally accessible by name through a number of functions. AsnFind() takes a string with the name of an ASN.1 specified entity or a partial path (sub-entities separated by dots) to the entity and returns a pointer to its type node. For example,

AsnTypePtr atp;

 

   atp = AsnFind("Seq‑entry.location");

will return the same pointer #defined as SEQ_ENTRY_location in the parse tree header file.

Other functions will return information about types at run-time. Using the atp obtained above for Seq-entry.location, which is a "Seq-loc", which is itself defined as the primitive type CHOICE:

CharPtr str;

 

   str = AsnFindPrimName(atp);    /* returns "CHOICE" */

   str = AsnFindBaseName(atp);    /* returns "Seq-loc"  */

For an ENUMERATED type one can get the values at run-time. For the ASN.1 specification:

Sex ::= ENUMERATED {

   male (1) ,

   female (2) };

the following code can be used:

AsnTypePtr atp;

CharPtr str;

 

   atp = AsnFind("Sex");

   str = AsnEnumTypeStr(atp, 2);     /* returns "female" */

   str = AsnEnumStr("Sex", 2);       /* also returns "female" */

Custom Read and Write Functions

The AsnLib read and write functions can be replaced to provide custom I/O using the AsnIoNew() function. This is how the NCBI network client/servers are implemented, by replacing the read and write functions with socket based routines. We have also used it to write blocks of ASN.1 in memory buffers for transfer in and out of databases. This is not normally something done by a novice, but several functions which read and write to memory are given in the toolkit as models of how to do this sort of thing.

   /*** read and write to memory buffer ***/

extern AsnIoMemPtr AsnIoMemOpen PROTO((CharPtr mode, BytePtr buf, Uint2 size));

extern AsnIoMemPtr AsnIoMemClose PROTO((AsnIoMemPtr aimp));

extern Boolean AsnIoMemReset PROTO((AsnIoMemPtr aimp, Uint2 bytes_to_read));

extern Int2 AsnIoMemRead PROTO((Pointer, CharPtr, Uint2));

extern Int2 AsnIoMemWrite PROTO((Pointer, CharPtr, Uint2));

 

   /*** read and write to a ByteStore in memory ***/

extern AsnIoBSPtr AsnIoBSOpen PROTO((CharPtr mode, ByteStorePtr bsp));

extern AsnIoBSPtr AsnIoBSClose PROTO((AsnIoBSPtr aibp));

extern Int2 AsnIoBSRead PROTO((Pointer, CharPtr, Uint2));

extern Int2 AsnIoBSWrite PROTO((Pointer, CharPtr, Uint2));

 

Customizing an AsnIo Stream

Sometimes one wishes to change the details of a series of functions at run-time. This can be accomplished by attaching AsnOption structures to the stream. These form a linked list of structures which carry user defined data and are identified by user defined class and type values. A series of functions allow the options to be added, removed, or located on a stream pointer. These are used to customize the behavior of the object loaders (see below) under different run-time conditions, but have many other uses as well. AsnOptions are not the same as AsnExpOptStructs, or exploration structures used by the generalized iterator described below.

ASN.1 Object Loaders

About the only time it is efficient to read the lower level ASN.1 raw values is when there are just a few types of simple values that one is interested in processing.  For example, if one wanted to record the relative occurrence of journal titles in some particular ASN.1 file, one could find those without worrying about the objects.  However, most of the time it is much more convenient to load all or a portion of the ASN.1 information into C code structured objects. 

In general, when the ASN.1 stream is positioned at the beginning of a structure, one can call the <OBJECT>AsnRead function (replacing "<OBJECT>" with some object name) which returns a pointer of the <OBJECT>'s type to an allocated structure.  This structure can then be processed within the C code.  To use these objects, it is convenient to know both the ASN.1 definitions and the C structures, as well as any special function names which operate on them.  For this reason, these different kinds of format descriptions (ASN.1 definitions, C structure definitions, and function prototypes) all appear together, alphabetized by C code object type (if it exists, else using the ASN.1 definition) following this section.  For most objects, there are <OBJECT>New() functions which allocate memory and set any default values, <OBJECT>Free() functions, which release the memory, <OBJECT>AsnRead() and <OBJECT>AsnWrite functions for communication with the ASN.1 I/O stream.  These are true objects in that the upper level objects inherit the slots and "knowledge" about the lower level objects, so that when, for example, an <OBJECT>Free() routine is called which is composed of (recursively) other sub-objects, their <SUB-OBJECT>Free() functions are used as needed. The same type of behavior is exhibited on the <OBJECT>AsnRead() and <OBJECT>AsnWrite() functions since they called the appropriate <SUB-OBJECT>AsnRead() and <SUB_OBJECT>AsnWrite() functions as needed.

The <OBJECT>New() functions take no parameters, and return an <OBJECT>Ptr.  The <OBJECT>Free() functions take an <OBJECT>Ptr parameter, pointing to the object that is to be returned to the heap, and return a NULL pointer of the same type.  The <OBJECT>AsnRead() functions take a pointer to an AsnIo stream (not a FILE *) that was opened with AsnIoOpen() and an AsnTypePtr which points within the parse tree to the type of the Id whose value follows.   An example of what is meant by this follows:

 

if ( -- expect seqentry only ---){

   atp = SEQ_ENTRY;

   while ((atp = AsnReadId(AsnFp, my_amp, atp)) != NULL) {

       the_set = SeqEntryAsnRead(AsnFp, atp);

       /*--process the SeqEntry --*/

       SeqEntryFree(the_set);

   }

} else {

   /*---Expect a BioseqSet----*/

   atp = BIOSEQ_SET;

   while ((atp = AsnReadId(AsnFp, my_amp, atp)) != NULL) {

       if (atp == BIOSEQ_SET_seq_set_E) {

          /*------------

          * The "..._E" is the type of the element of the

          * seq-set.  Generally, when there  are repeating elements

          * of the same type, the "_E" type holds a place in the parse tree.

          *--------------*/

          the_set = SeqEntryAsnRead(AsnFp, atp);

 

          /*--process the SeqEntry --*/

          SeqEntryFree(the_set);

       } else {

          AsnReadVal(AsnFp,atp, &value);

       }

   }

}

 

An <OBJECT>Ptr (or NULL on some error conditions) is returned.  The <OBJECT>AsnWrite() functions take the same parameters as the <OBJECT>AsnRead() functions, with the addition of an <OBJECT>Ptr to the object to be added to the ASN.1 stream.  The return is a Boolean (TRUE on success, FALSE on failure).

In many cases, these standard functions are all that are needed. In some special cases additional functions for comparing, duplicating, or displaying objects are provided as well. The object loaders are discussed in the following chapters which describe the NCBI data objects themselves. Finally there are chapters on utility functions which perform more complex operations on these objects.

AsnLib and Object Loaders As a Generalized Iterator

The ability to scan a stream of data and identify and extract data items in a very general way just using their names as defined in their ASN.1 specification is a very powerful aspect of AsnLib functionality. Since the object loader xxxAsnWrite() functions must exhaustively traverse the internal C structures to write them out, and must "know" both the ASN.1 specified type of every structure and field, one can use these functions to create a generalized iterator for the object loader structures in memory.

One can create a "null" output AsnIoPtr (although this will work on a real AsnWrite as well) by using:

AsnIoPtr aip;

 

   aip = AsnIoNullOpen();

One can then associate a data type from the ASN.1 specification or a partial path in the ASN.1 specification where each element is separated by dots. "Seq-loc" is the Seq-loc object no matter what it's context. "Seq-feat.location" is a Seq-loc ONLY in the "location" slot of a Seq-feat. The Seq-feat itself can be in any context, since that is the top of the partial path. Whenever the object loader AsnWrite routine encounters a data item that satisfies the partial path, it can be made to call a user supplied callback function with arguments of a user defined data object and the data object that would be written. An AsnIoPtr can have as many of these options as desired. More than one callback can be associated with the same data type. More than one datatype can be associated with the same callback. Explore options are associate with a stream like this program which counts the features in a SeqEntry.:

   typedef struct mydata {

       Int2 counter;

   } Mydata, PNTR MydataPtr;

 

/*** counts features in a SeqEntry ***/

Int2 countfeats(SeqEntryPtr sep)

   MydataPtr localptr;

   AsnIoPtr aip;

   Int2 num;

 

   localptr = (MydataPtr)MemNew(sizeof(Mydata));

   localptr->counter = 0;

   aip = AsnIoNullOpen();

   AsnExpOptNew(aip, "Seq-feat", (Pointer)localptr, mycallback);

   SeqEntryAsnWrite(sep, aip, NULL);   /* object loader write */

   num = localptr->counter;

   MemFree(localptr);

   AsnIoClose(aip);

   return num;

}

 

void mycallback (AsnExpOptStructPtr aeosp)

{

   SeqFeatPtr sfp;

   MydataPtr mdp;

 

   /*** this will be called at both the beginning and end of writing */

    /**  a structure. Be sure we only act once (at the beginning) */

 

   if (aeosp->dvp->intvalue != START_STRUCT) return;

 

    /** get the SeqFeatPtr ***/

    /** this step is unnecessary in this application.. it's just here */

    /** to show where to get it */

 

   sfp = (SeqFeatPtr) aeosp->the_struct;

 

   /** get the user supplied data **/

 

   mdp = (MydataPtr) aeosp->data;

 

   /*** do the job of counting ****/

 

   mdp->counter++;

 

   /*** that's it *****/

 

   return;

}

The AsnExpOptStruct, aeosp, is not the same as an AsnOption, described earlier. The aeosp‑>dvp is the DataValPtr which would normally be written out on the AsnWrite(). For primitive types it contains the integer, boolean, real number, CharPtr or ByteStorePtr for the data. For structures like SEQUENCE, SET, etc, it contains the value START_STRUCT or END_STRUCT, and the pointer to the C structure will be in aeosp->the_struct, as above. When the same callback is used for different data types, the data type can be found in aeosp->atp for all types. When writing a CHOICE, a key for the CHOICE is found in aeosp->the_choice, and a value appropriate to the CHOICE is found in aeosp->dvp.  What is delivered for a CHOICE type can be problematic, since for a CHOICE itself, nothing but a type is normally written, so it is a judgment call what to supply in dvp. For these types, one should look at the object loader .c file to be certain what will be passed.

Note that for this iterator to work for structures and choices, AsnOpenStruct(), AsnCloseStruct(), and AsnWriteChoice() must be used in the object loaders.

When the stream is closed, the ExpOpt structures are also freed. If a stream is to be reused then an AsnExpOptFree() function is provided to strip ExpOpts off the stream pointer.

The generalized iterator shown here can be used to treat the object loader structures as a random access database with named keys in memory. It is extremely powerful and flexible. Its main drawback is that it must travers the whole structure to find the fields of interest. Since this is normally very fast anyway, this is not a major problem at the moment, although for very large objects it may be.

AsnLib and Object Loaders Provide a Generalized Copy and Compare

Any data of arbitrary complexity can be easily copied or compared using the object loaders. Basically the object loader read and write functions, and a pointer to the object to be copied are passed to a function. The functions are then used first to write the struct as ASN.1, to a file or in memory, and then are used to read it back into a new structure, and then return a pointer to the new structure. The compare is done the same way, except one copy is written, then the other is written and, as part of the second write, compared to the first write (only one copy ever actually exists as an ASN.1 stream). This is a byte by byte compare, so the objects must be completely identical to return TRUE.

extern Pointer AsnIoCopy PROTO((Pointer from, AsnReadFunc readfunc,

                                                       AsnWriteFunc writefunc);

 

extern Pointer AsnIoMemCopy PROTO((Pointer from, AsnReadFunc readfunc,

                                                       AsnWriteFunc writefunc));

 

extern Boolean AsnIoMemComp PROTO((Pointer a, Pointer b,

                                                        AsnWriteFunc writefunc));

 

AsnLib Interface: asn.h

/* asn.h

* ===========================================================================

*

*                            PUBLIC DOMAIN NOTICE                         

*               National Center for Biotechnology Information

*                                                                          

*  This software/database is a "United States Government Work" under the  

*  terms of the United States Copyright Act.  It was written as part of   

*  the author's official duties as a United States Government employee and

*  thus cannot be copyrighted.  This software/database is freely available

*  to the public for use. The National Library of Medicine and the U.S.   

*  Government have not placed any restriction on its use or reproduction. 

*                                                                          

*  Although all reasonable efforts have been taken to ensure the accuracy 

*  and reliability of the software and data, the NLM and the U.S.         

*  Government do not and cannot warrant the performance or results that   

*  may be obtained by using this software or data. The NLM and the U.S.   

*  Government disclaim all warranties, express or implied, including      

*  warranties of performance, merchantability or fitness for any particular

*  purpose.                                                               

*                                                                         

*  Please cite the author in any work or product based on this material.  

*

* ===========================================================================

*

* File Name: asn.h

*

* Author:  James Ostell

*

* Version Creation Date: 1/1/91

*

* $Revision: 1.3 $

*

* File Description:

*   This header the interface to all the routines in the ASN.1 libraries

*     that an application should ever use.  It also includes the necessary

*     typedefs -- however the application programmer is not meant to use

*     the internal structures directly outside of the specified functions,

*     as the internal structures may be changed without notice.

*

* Modifications: 

* --------------------------------------------------------------------------

* Date     Name        Description of modification

* -------  ----------  -----------------------------------------------------

*

*

* ==========================================================================

*/

 

#ifndef _ASNTOOL_

#define _ASNTOOL_

                      /*** depends on NCBI core routines ***/

#ifndef _NCBI_

#include <ncbi.h>

#endif

 

#ifdef __cplusplus

extern "C" {

#endif

 

   /**** ValNode is used for internal representation of values from

   ****  CHOICE, SET OF, SEQ OF and combinations for many cases.

   ****  it is provided in ncbimisc for build object routines ****/

 

/***  The following defines can be used for backward compatibility

#define AsnValue DataVal

#define AsnNode ValNode

***/

/***  In addition, AsnValueNode was changed to AsnValxNode so it would

      not conflict with the AsnValue define above

****/

 

#ifndef START_STRUCT

#define START_STRUCT       411           /* { found */

#define END_STRUCT         412           /* } found */

#endif

 

typedef struct asnvaluenode {

   Int2 valueisa;

   CharPtr name;           /* use for strings and named int */

   Int4 intvalue;              /* use for int and boolean */

   FloatHi realvalue;

   struct asnvaluenode PNTR next;

}  AsnValxNode, PNTR AsnValxNodePtr;

 

   /******** AsnType is a node in the AsnTool parse tree *******/

 

typedef struct asntype {

   Int2 isa;

   CharPtr name;

   Uint1 tagclass;

   Int2 tagnumber;

   Boolean implicit;

   Boolean optional;

   Boolean hasdefault;

   Boolean exported;

   Boolean imported;

   Boolean resolved;

   AsnValxNodePtr defaultvalue;          /* used for default value, range, subtypes */

   struct asntype PNTR type;

   Pointer branch;                       /* used for named ints, enum, set, sequence */

   Int2 tmp;     /* for temporary ->type link to local tree */

   struct asntype PNTR next;

}  AsnType, PNTR AsnTypePtr;

 

typedef struct asnmodule {

   CharPtr modulename;

   CharPtr filename;           /* if module to be loaded from disk */

   AsnTypePtr types;

   AsnTypePtr values;

   struct asnmodule PNTR next;    /* for chain of modules */

   Int2 lasttype;          /* for isa defined types */

   Int2 lastvalue;         /* for isa defined values */

}  AsnModule, PNTR AsnModulePtr;

 

#define ASNIO_BUFSIZE      1024    /* default size of AsnIo.buf */

                                /* AsnIo.type  bit[0] = text? bit[1]=binary?*/

                                /* bit[2]=input? bit[3]=output?           */

#define ASNIO_TEXT  1

#define ASNIO_BIN   2

#define ASNIO_IN    4

#define ASNIO_OUT   8

#define ASNIO_FILE  16

#define ASNIO_CARRIER   32     /* is a pure iterator */

 

#define ASNIO_TEXT_IN      21     /* AsnIo.type */

#define ASNIO_TEXT_OUT     25

#define ASNIO_BIN_IN 22

#define ASNIO_BIN_OUT      26

 

typedef struct pstack {

    AsnTypePtr type;           /* type at this level of stack */

    Int4 len;                  /* length of item for binary decode */

    Boolean resolved;          /* resolution of type for binary decode */

   Boolean tag_indef;                /* indefinate tag length on input? */

} Pstack, PNTR PstackPtr;

 

typedef void (* AsnOptFreeFunc) PROTO ((Pointer));

 

typedef struct asnopt {

   Int2 ao_class;               /* class of option. all negative numbers res.*/

   Int2 type;                /* type within ao_class */

   DataVal data;            /* data used for setting option */

   AsnOptFreeFunc freefunc;  /* function to free data.ptrvalue */

   struct asnopt PNTR next;

} AsnOption, PNTR AsnOptionPtr;

 

typedef struct asnexpoptstruct {

   struct asnio PNTR aip;

   AsnTypePtr atp;

   DataValPtr dvp;

   Int2 the_choice;

   Pointer the_struct;

   Pointer data;

} AsnExpOptStruct, PNTR AsnExpOptStructPtr;

 

typedef void (* AsnExpOptFunc) PROTO ((AsnExpOptStructPtr));

#define NO_CHOICE_SET INT2_MIN     /* for AsnExpOptStruct.the_choice  */

 

typedef struct expopt {

   Int2 numtypes;

   AsnTypePtr PNTR types;             /* the type to check */

   Pointer user_data;           /* user supplied data */

   AsnExpOptFunc user_callback; /* user supplied callback function */

   struct expopt PNTR next;

} AsnExpOpt, PNTR AsnExpOptPtr;

 

typedef void ( *ErrorRetType) PROTO((Int2, CharPtr));

typedef Int2 ( *IoFuncType) PROTO((Pointer, CharPtr, Uint2));

 

typedef struct asnio {

   CharPtr linebuf;

   Int1 type;            /* type- text-in, text-out, bin-in, bin-out */

   Int2 linepos;         /* current offset in linebuf */

   FILE * fp;             /* file to write or read to */

   BytePtr buf;          /* buffer for I/O */

    Int2 bufsize;         /* sizeof this buffer */

   Int2 bytes,           /* bytes of data available in buf */

       offset;           /* current offset of processing in buf */

   Uint1 tagclass;       /* last BER tag-id-len read */

   Int2 tagnumber;

   Boolean constructed;

   Int4 length;          /* length of BER encoded data */

   Boolean tagsaved;     /* TRUE if tag info already here - stops read */

   Int4 used;            /* if tagsaved, bytes used recorded here */

   Int1 tabsize,         /* spaces per tab */

       indent_level,     /* current indent level for print output */

       linelength,      /* max line length on output */

       max_indent,       /* current maximum indent levels for first */

       state;            /* parsing state */

    BoolPtr first;        /* for first element on indented line for printing */

   Int4 linenumber;      /* for reporting errors */

   CharPtr word;           /* current word in linebuf */

   Int2 wordlen,         /* length of word in linebuf */

        token;           /* current parsing token for word */

    PstackPtr typestack;  /* the parsing stack for input and output */

   Int1 type_indent,     /* used like indent_level and max_indent, but for */

       max_type;         /* typestack */

   ErrorRetType error_ret;     /* user error return */

    Pointer iostruct;    /* non-FILE io structure */

    IoFuncType readfunc,      /* read/write functions for sockets */

          writefunc;     /*  open and close MUST be done outside AsnIo */

   Boolean read_id;     /* for checking AsnReadId AsnReadVal alternation */

   CharPtr fname;       /* name of file in use */

   AsnOptionPtr aop;    /* head of options chain */

   AsnExpOptPtr aeop;   /* exploration options chain */

   AsnExpOptStructPtr aeosp;

   Boolean io_failure;  /* set on failed write */

} AsnIo, PNTR AsnIoPtr;

 

typedef struct asniomem {    /* for AsnIo to and from a memory block */

   AsnIoPtr aip;                  /* the AsnIoPtr for this */

   BytePtr buf;                   /* a buffer for the data */

   Uint2 size,             /* size of this buffer (w) or bytes_to_read (r) */

          count;           /* count of bytes read from or written to buffer */

} AsnIoMem, PNTR AsnIoMemPtr;

 

typedef struct asniobs {    /* for AsnIo to and from a memory ByteStore */

   AsnIoPtr aip;                  /* the AsnIoPtr for this */

   ByteStorePtr bsp;        /* byte store for this */

} AsnIoBS, PNTR AsnIoBSPtr;

 

/***** typedefs used often in object loaders **********/

 

typedef Pointer (* AsnReadFunc) PROTO((AsnIoPtr aip, AsnTypePtr atp));

typedef Boolean (* AsnWriteFunc) PROTO((Pointer object, AsnIoPtr aip, AsnTypePtr atp));

 

/*****************************************************************************

*

*   prototypes

*     

*****************************************************************************/

/*** asngen.c ****/

 

extern AsnTypePtr AsnReadId PROTO((AsnIoPtr aip, AsnModulePtr amp, AsnTypePtr atp));

extern Int2 AsnReadVal PROTO((AsnIoPtr aip, AsnTypePtr atp, DataValPtr vp));

extern Boolean AsnWrite PROTO((AsnIoPtr aip, AsnTypePtr atp, DataValPtr dvp));

extern Boolean AsnSkipValue PROTO((AsnIoPtr aip, AsnTypePtr atp));

 

extern Boolean AsnOpenStruct PROTO((AsnIoPtr aip, AsnTypePtr atp,

          Pointer the_struct));

extern Boolean AsnCloseStruct PROTO((AsnIoPtr aip, AsnTypePtr atp,

          Pointer the_struct));

extern Boolean AsnWriteChoice PROTO((AsnIoPtr aip, AsnTypePtr atp, Int2 choice,

          DataValPtr the_value));

extern void AsnCheckExpOpt PROTO((AsnIoPtr aip, AsnTypePtr atp, DataValPtr dvp));

extern AsnExpOptPtr AsnExpOptNew PROTO((AsnIoPtr aip, CharPtr path,

          Pointer user_data, AsnExpOptFunc user_func));

extern AsnExpOptPtr AsnExpOptFree PROTO((AsnIoPtr aip, AsnExpOptPtr aeop));

 

extern Int2 AsnGetLevel PROTO((AsnIoPtr aip));

extern void AsnNullValueMsg PROTO((AsnIoPtr aip, AsnTypePtr node));

 

/*** asntypes.c ***/

 

extern void AsnKillValue PROTO((AsnTypePtr atp, DataValPtr dvp));

extern AsnTypePtr PNTR AsnTypePathFind PROTO((AsnModulePtr amp, CharPtr str, Int2Ptr numtypes));

extern AsnTypePtr AsnTypeFind PROTO((AsnModulePtr amp, CharPtr str));

#define AsnFind(x) AsnTypeFind(NULL,x)    /* find type (all) */

extern CharPtr AsnFindPrimName PROTO((AsnTypePtr atp));

extern CharPtr AsnFindBaseName PROTO((AsnTypePtr atp));

extern AsnTypePtr AsnLinkType PROTO((AsnTypePtr type, AsnTypePtr localtype));

extern void AsnUnlinkType PROTO((AsnTypePtr type));

extern CharPtr AsnTypeDumpStack PROTO((CharPtr str, AsnIoPtr aip));

extern Boolean AsnTreeLoad PROTO((char * file, AsnValxNodePtr * avnptr, AsnTypePtr * atptr, AsnModulePtr * ampptr));

#define AsnLoad() AsnTreeLoad(asnfilename, &avn, &at, &amp)   /* simple loader */

extern void AsnModuleLink PROTO((AsnModulePtr amp));

extern CharPtr AsnEnumStr PROTO((CharPtr str, Int2 val));

extern CharPtr AsnEnumTypeStr PROTO((AsnTypePtr atp, Int2 val));

extern AsnModulePtr AsnAllModPtr PROTO((void));

 

/*** asnio.c ****/

 

extern AsnIoPtr AsnIoOpen PROTO((CharPtr file_name, CharPtr mode));

extern AsnIoPtr AsnIoClose PROTO((AsnIoPtr aip));

extern void AsnIoReset PROTO((AsnIoPtr aip));

extern void AsnIoSetErrorMsg PROTO((AsnIoPtr aip, ErrorRetType error_ret));

extern Int4 AsnIoSeek PROTO((AsnIoPtr aip, Int4 pos));

extern Int4 AsnIoTell PROTO((AsnIoPtr aip));

extern void AsnIoFlush PROTO((AsnIoPtr aip));

extern AsnIoPtr AsnIoNew PROTO((Int1 type, FILE * fp, Pointer iostruct, IoFuncType readfunc, IoFuncType writefunc));

extern Boolean AsnIoSetBufsize PROTO((AsnIoPtr aip, Int2 size));

extern AsnOptionPtr AsnIoOptionNew PROTO((AsnIoPtr aip, Int2 ao_class, Int2 type, DataVal av, AsnOptFreeFunc freefunc));

extern void AsnIoOptionFree PROTO((AsnIoPtr aip, Int2 ao_class, Int2 type));

extern Boolean AsnClassTypeMatch PROTO((Int2 ao_class, Int2 type, Int2 this_class, Int2 this_type));

extern AsnOptionPtr AsnIoOptionGet PROTO((AsnIoPtr aip, Int2 ao_class, Int2 type,

                                  AsnOptionPtr last));

extern AsnOptionPtr AsnOptionNew PROTO((AsnOptionPtr PNTR aopp, Int2 ao_class, Int2 type, DataVal av, AsnOptFreeFunc freefunc));

extern void AsnOptionFree PROTO((AsnOptionPtr PNTR aopp, Int2 ao_class, Int2 type));

extern AsnOptionPtr AsnOptionGet PROTO((AsnOptionPtr head, Int2 ao_class, Int2 type,

                                  AsnOptionPtr last));

 

   /*** read and write to memory buffer ***/

extern AsnIoMemPtr AsnIoMemOpen PROTO((CharPtr mode, BytePtr buf, Uint2 size));

extern AsnIoMemPtr AsnIoMemClose PROTO((AsnIoMemPtr aimp));

extern Boolean AsnIoMemReset PROTO((AsnIoMemPtr aimp, Uint2 bytes_to_read));

extern Int2 AsnIoMemRead PROTO((Pointer, CharPtr, Uint2));

extern Int2 AsnIoMemWrite PROTO((Pointer, CharPtr, Uint2));

 

   /*** read and write to a ByteStore in memory ***/

extern AsnIoBSPtr AsnIoBSOpen PROTO((CharPtr mode, ByteStorePtr bsp));

extern AsnIoBSPtr AsnIoBSClose PROTO((AsnIoBSPtr aibp));

extern Int2 AsnIoBSRead PROTO((Pointer, CharPtr, Uint2));

extern Int2 AsnIoBSWrite PROTO((Pointer, CharPtr, Uint2));

 

  /** Copy and Compare functions ***/

extern Pointer AsnIoCopy PROTO((Pointer from, AsnReadFunc readfunc, AsnWriteFunc writefunc));

extern Pointer AsnIoMemCopy PROTO((Pointer from, AsnReadFunc readfunc, AsnWriteFunc writefunc));

extern Boolean AsnIoMemComp PROTO((Pointer a, Pointer b, AsnWriteFunc writefunc));

 

#define AsnIoNullOpen() AsnIoNew((ASNIO_OUT | ASNIO_TEXT | ASNIO_CARRIER), NULL, NULL, NULL, NULL)

 

/*** asndebin.c ***/

 

extern AsnTypePtr AsnBinReadId PROTO((AsnIoPtr aip, AsnTypePtr atp));

extern Int2 AsnBinReadVal PROTO((AsnIoPtr aip, AsnTypePtr atp, DataValPtr vp));

 

/*** asnenbin.c ***/

 

extern Boolean AsnBinWrite PROTO((AsnIoPtr aip, AsnTypePtr atp, DataValPtr dvp));

         /** expert use only ***/

extern void AsnEnBinBytes PROTO((Pointer ptr, Uint4 len, AsnIoPtr aip));

 

/*** asnlex.c ***/

 

extern AsnTypePtr AsnTxtReadId PROTO((AsnIoPtr aip, AsnModulePtr amp, AsnTypePtr atp));

extern Int2 AsnTxtReadVal PROTO((AsnIoPtr aip, AsnTypePtr atp, DataValPtr vp));

 

/*** asnprint.c ***/

 

extern Boolean AsnTxtWrite PROTO((AsnIoPtr aip, AsnTypePtr atp, DataValPtr dvp));

 

/*** asnlext.c ***/

 

extern AsnModulePtr AsnLoadModules PROTO((AsnIoPtr aip));

 

/******** temporary defines for older code *************/

 

#define AsnStartStruct(x,y) AsnOpenStruct(x, y, NULL)

#define AsnEndStruct(x,y) AsnCloseStruct(x, y, NULL)

 

/***** AsnOption ao_class values - do not reuse ***************/

/***** all positive numbers > 0 are available to non-NCBI applications ***/

 

#define OP_ANY          0

#define OP_TOGENBNK    -1

#define OP_BB2ASN      -2

#define OP_NCBIOBJSSET -3

#define OP_NCBIOBJSEQ  -4

#define OP_GET_MUID    -5

 

 

#ifdef __cplusplus

}

#endif

 

#endif