|
National Center for Biotechnology Information National Library of Medicine National Institutes of Health, Building 38A 8600 Rockville Pike Bethesda, MD 20984 301-496-2475 FAX 301-480-9241 |
|
NCBI Software Development ToolKit
Version 1.9 - August 1, 1994
Draft Copy
This documentation is always incomplete and under revision.
NCBI Software Development ToolKit
Short Table of Contents
Full Table of Contents........................................................................................................................................... 1
Overview................................................................................................................................................................... 9
Data Model............................................................................................................................................................... 19
CoreLib: Portable Core Library.......................................................................................................................... 31
AsnLib: ASN.1 Processing................................................................................................................................... 49
General Use Objects.............................................................................................................................................. 81
Bibliographic References..................................................................................................................................... 91
MEDLINE Data....................................................................................................................................................... 109
Biological Sequences............................................................................................................................................. 115
Collections of Sequences..................................................................................................................................... 159
Sequence Locations and Identifiers................................................................................................................. 167
Sequence Features................................................................................................................................................. 185
Sequence Alignments........................................................................................................................................... 215
Sequence Graphs................................................................................................................................................... 225
Sequence Utilities................................................................................................................................................... 229
Entrez Data Access................................................................................................................................................. 243
Vibrant User Interface Tools.............................................................................................................................. 257
Full Table of Contents...........................................................................................................................................
Overview...................................................................................................................................................................
Introduction...............................................................................................................................................
Components Of The Software Development ToolKit....................................................................
ASN.1............................................................................................................................................
Data Model For Biological Sequences....................................................................................
CoreLib: Writing Portable Software......................................................................................
AsnLib: Reading and Writing ASN.1......................................................................................
Object Loaders: Combining AsnLib and the Data Model...................................................
Utilities........................................................................................................................................
Data Access..................................................................................................................................
Vibrant: A Portable Windowing System...............................................................................
A Few Samples.........................................................................................................................................
Using This Document.............................................................................................................................
Contacting NCBI......................................................................................................................................
Data Model...............................................................................................................................................................
Introduction...............................................................................................................................................
Biological Sequences...............................................................................................................................
Classes of Biological Sequences..........................................................................................................
Locations on Biological Sequences.....................................................................................................
Associating Annotation With Locations On Biological Sequences..........................................
Feature Tables.............................................................................................................................
Sequence Alignments................................................................................................................
Sequence Graph..........................................................................................................................
Collections of Related Biological Sequences....................................................................................
Consequences of the Data Model........................................................................................................
CoreLib: Portable Core Library..........................................................................................................................
Introduction...............................................................................................................................................
Application Frameworks.......................................................................................................................
Main Entry Point........................................................................................................................
Getting Program Arguments...................................................................................................
User Interface Elements..........................................................................................................................
Alerts............................................................................................................................................
Beeps.............................................................................................................................................
Monitors......................................................................................................................................
Configuration Files..................................................................................................................................
File Names..................................................................................................................................
File Format..................................................................................................................................
Configuration File Functions..................................................................................................
Error Processing.......................................................................................................................................
Posting An Error........................................................................................................................
User Error Strings......................................................................................................................
Customization............................................................................................................................
Configuration File Settings......................................................................................................
Preparing Error Message Files................................................................................................
Fetching and Displaying Errors..............................................................................................
Installing Custom Error Handlers..........................................................................................
Miscellaneous Utility Functions..............................................................................................
Files and Directories...............................................................................................................................
ANSI-Style Functions................................................................................................................
Directory Management.............................................................................................................
CD-ROM......................................................................................................................................
Customization............................................................................................................................
Memory Management.............................................................................................................................
ANSI-Style Functions................................................................................................................
Fixed Memory............................................................................................................................
Relocatable Memory.................................................................................................................
Byte Stores..................................................................................................................................................
String Functions.......................................................................................................................................
ANSI-Style Functions................................................................................................................
Additional String Functions.....................................................................................................
Number Strings..........................................................................................................................
Time Strings................................................................................................................................
SGML Strings..............................................................................................................................
ValNode Functions.................................................................................................................................
Math Functions........................................................................................................................................
Macros..........................................................................................................................................
Arithmatic Functions.................................................................................................................
Transendental Functions...........................................................................................................
Gamma Functions......................................................................................................................
Advanced Functions..................................................................................................................
Miscellaneous Utilities...........................................................................................................................
Macros..........................................................................................................................................
Random Numbers......................................................................................................................
Sorting..........................................................................................................................................
Time..............................................................................................................................................
Process ID....................................................................................................................................
Application Properties..............................................................................................................
Debugging Macros.....................................................................................................................
Portability Issues......................................................................................................................................
Portable Types............................................................................................................................
Integral Types...........................................................................................................
Floating-point Types...............................................................................................
Pointer Types............................................................................................................
Avoiding Name Collisions...................................................................................
Byte Order...................................................................................................................................
Function Prototypes..................................................................................................................
AsnLib: ASN.1 Processing...................................................................................................................................
Introduction to ASN.1............................................................................................................................
Why ASN.1..................................................................................................................................
Structure of ASN.1......................................................................................................................
Further information about ASN.1...........................................................................................
AsnLib: Overview....................................................................................................................................
Principles of Operation..........................................................................................................................
Specification for AsnLib........................................................................................................................
AsnTool......................................................................................................................................................
AsnTool Tutorial......................................................................................................................................
Using AsnLib............................................................................................................................................
AsnLib: A Tutorial...................................................................................................................................
getmesh.c.....................................................................................................................................
indexpub.c...................................................................................................................................
getpub.c........................................................................................................................................
Data-links...................................................................................................................................................
AsnLib Generated Header Files..........................................................................................................
Returns From AsnLib Parsing.............................................................................................................
Finding AsnTypePtrs at Run-time......................................................................................................
Custom Read and Write Functions....................................................................................................
Customizing an AsnIo Stream.............................................................................................................
ASN.1 Object Loaders.............................................................................................................................
AsnLib and Object Loaders As a Generalized Iterator.................................................................
AsnLib and Object Loaders Provide a Generalized Copy and Compare................................
AsnLib Interface: asn.h..........................................................................................................................
General Use Objects..............................................................................................................................................
Introduction...............................................................................................................................................
Large Text Blocks: StringStore..............................................................................................................
The Date......................................................................................................................................................
Identifying Things: Object-id................................................................................................................
Identifying Things: Dbtag.....................................................................................................................
Identifying People: Person-id...............................................................................................................
Expressing Uncertainty with Fuzzy Integers: Int-fuzz.................................................................
Creating Your Own Objects: User-object...........................................................................................
ASN.1 Specification: general.asn........................................................................................................
C Structures and Functions: objgen.h................................................................................................
Bibliographic References.....................................................................................................................................
Introduction...............................................................................................................................................
Citation Components: Affiliation........................................................................................................
Citation Components: Authors............................................................................................................
Citation Components: Imprint.............................................................................................................
Citation Components: Title...................................................................................................................
Citing an Article.......................................................................................................................................
Citing a Journal........................................................................................................................................
Citing a Book.............................................................................................................................................
Citing a Proceedings...............................................................................................................................
Citing a Letter, Manuscript, or Thesis...............................................................................................
Citing Directly Submitted Data............................................................................................................
Citing a Patent..........................................................................................................................................
Identifying a Patent.................................................................................................................................
Citing an Article or Book which is In Press.....................................................................................
Special Cases: Unpublished, Unparsed, or Unusual....................................................................
Accommodating Any Publication Type............................................................................................
Grouping Different Forms of Citation for a Single Work.............................................................
Sets of Citations........................................................................................................................................
Comparing Citations..............................................................................................................................
ASN.1 Specification: biblio.asn...........................................................................................................
C Structures and Functions: objbibli.h..............................................................................................
ASN.1 Specification: pub.asn...............................................................................................................
C Structures and Functions: objpub.h...............................................................................................
MEDLINE Data.......................................................................................................................................................
Introduction...............................................................................................................................................
Structure of a MEDLINE Entry............................................................................................................
MeSH Index Terms..................................................................................................................................
Substance Records...................................................................................................................................
Database Cross Reference Records.....................................................................................................
Funding Identifiers..................................................................................................................................
Gene Symbols............................................................................................................................................
ASN.1 Specification: medline.asn.......................................................................................................
C Structures and Functions: objmedli.h............................................................................................
Biological Sequences.............................................................................................................................................
Introduction...............................................................................................................................................
Bioseq: the Biological Sequence...........................................................................................................
Seq-id: Identifying the Bioseq...............................................................................................................
Seq-annot: Annotating the Bioseq.......................................................................................................
Seq-descr: Describing the Bioseq and Placing It In Context........................................................
mol-type: The Molecule Type..................................................................................................
modif: Modifying Our Assumptions About a Bioseq..........................................................
method: Protein Sequencing Method.....................................................................................
name: A Descriptive Name......................................................................................................
title: A Descriptive Title............................................................................................................
org: What Organism Did this Come From?..........................................................................
comment: Commentary Text...................................................................................................
num: Applying a Numbering System to a Bioseq...............................................................
maploc: Map Location...............................................................................................................
pir: PIR Specific Data.................................................................................................................
sp: SWISSPROT Data..................................................................................................................
embl: EMBL Data........................................................................................................................
prf: PRF Data...............................................................................................................................
pdb: PDB Data.............................................................................................................................
genbank: GenBank Flatfile Specific Data...............................................................................
pub: Description of a Publication............................................................................................
region: Name of a Genomic Region.......................................................................................
user: A User-defined Structured Object..................................................................................
neighbors: Bioseqs Related by Sequence Similarity............................................................
create-date:..................................................................................................................................
update-date:.................................................................................................................................
het: Heterogen............................................................................................................................
Seq-inst: Instantiating the Bioseq........................................................................................................
Seq-inst: Virtual Bioseq.............................................................................................................
Seq-inst: Raw Bioseq..................................................................................................................
Seq-inst: Segmented Bioseq......................................................................................................
Seq-inst: Reference Bioseq........................................................................................................
Seq-inst: Constructed Bioseq....................................................................................................
Seq-inst: Typical or Consensus Bioseq...................................................................................
Seq-inst: Map Bioseqs................................................................................................................
Seq-hist: History of a Seq-inst...............................................................................................................
Seq-data: Encoding the Sequence Data Itself...................................................................................
IUPACaa: The IUPAC-IUB Encoding of Amino Acids.........................................................
NCBIeaa: Extended IUPAC Encoding of Amino Acids.......................................................
NCBIstdaa: A Simple Sequential Code for Amino Acids...................................................
NCBI8aa: An Encoding for Modified Amino Acids.............................................................
IUPAC3aa: A 3 Letter Display Code for Amino Acids........................................................
NCBIpaa: A Profile Style Encoding for Amino Acids.........................................................
IUPACna: The IUPAC-IUB Encoding for Nucleic Acids......................................................
NCBI4na: A Four Bit Encoding of Nucleic Acids..................................................................
NCBI2na: A Two Bit Encoding for Nucleic Acids.................................................................
NCBI8na: An Eight Bit Sequential Encoding for Modified Nucleic Acids.......................
NCBIpna: A Frequency Profile Encoding for Nucleic Acids..............................................
Tables of Sequence Codes......................................................................................................................
Mapping Between Different Sequence Alphabets..........................................................................
Data and Tools for Sequence Alphabets...........................................................................................
Pubdesc: Publication Describing a Bioseq.......................................................................................
Numbering: Applying a Numbering System to a Bioseq.............................................................
Num-cont: A Continuous Integer Numbering System.......................................................
Num-real: A Real Number Numbering Scheme..................................................................
Num-enum: An Enumerated Numbering Scheme...............................................................
Num-ref: Numbering by Reference to Another Bioseq......................................................
Numbering: C Structures and Utility Functions..................................................................
ASN.1 Specification: seq.asn................................................................................................................
ASN.1 Specification: seqblock.asn......................................................................................................
ASN.1 Specification: seqcode.asn.......................................................................................................
C Structures and Functions: objseq.h.................................................................................................
C Structures and Functions: objpubd.h.............................................................................................
C Structures and Functions: objblock.h.............................................................................................
C Structures and Functions: objcode.h..............................................................................................
Collections of Sequences.....................................................................................................................................
Introduction...............................................................................................................................................
Seq-entry: The Sequence Entry.............................................................................................................
Bioseq-set: A Set Of Seq-entrys.............................................................................................................
id: local identifier for this set...................................................................................................
coll: global identifier for this set.............................................................................................
level: nesting level of set..........................................................................................................
class: classification of sets.........................................................................................................
release: an explanatory string..................................................................................................
date:..............................................................................................................................................
descr: Seq-descr for this set.......................................................................................................
seq-set: the sequences and sets within the Bioseq-set..........................................................
annot: Seq-annots for the set....................................................................................................
Bioseq-sets are Convenient Packages................................................................................................
ASN.1 Specification: seqset.asn...........................................................................................................
C Structures and Functions: objsset.h................................................................................................
Sequence Locations and Identifiers.................................................................................................................
Introduction...............................................................................................................................................
Seq-id: Identifying Sequences...............................................................................................................
Seq-id: Semantics of Use........................................................................................................................
local: Privately Maintained Data.............................................................................................
other: A Local Textseq-id..........................................................................................................
general: Ids from Local Databases..........................................................................................
gibbsq, gibbmt: GenInfo Backbone Ids..................................................................................
genbank, embl, ddbj: The International Nucleic Acid Sequence Databases....................
pir: PIR International.................................................................................................................
swissprot: SWISS-PROT.............................................................................................................
prf: Protein Research Foundation...........................................................................................
patent: Citing a Patent...............................................................................................................
pdb: Citing a Biopolymer Chain from a Structure Database.............................................
giim: GenInfo Import Id...........................................................................................................
gi: A Stable, Uniform Id Applied to Sequences From All Sources....................................
Seq-id: The C Implementation..............................................................................................................
NCBI ID Database: Imposing Stable Seq-ids....................................................................................
Seq-loc: Locations on a Bioseq.............................................................................................................
null: A Gap..................................................................................................................................
empty: A Gap in an Alignment...............................................................................................
whole: A Reference to a Whole Bioseq..................................................................................
int: An Interval on a Bioseq......................................................................................................
packed-int: A Series of Intervals..............................................................................................
pnt: A Single Point on a Sequence...........................................................................................
packed-pnt: A Collection of Points.........................................................................................
mix: An Arbitrarily Complex Location.................................................................................
equiv: Equivalent Locations.....................................................................................................
bond: A Chemical Bond Between Two Residues..................................................................
feat: A Location Indirectly Referenced Through A Feature................................................
Seq-loc: The C Implementation............................................................................................................
ASN.1 Specification: seqloc.asn..........................................................................................................
C Structures and Functions: objloc.h.................................................................................................
Sequence Features.................................................................................................................................................
Introduction...............................................................................................................................................
Seq-feat: Structure of a Feature.............................................................................................................
id: Features Can Have Identifiers...........................................................................................
data: Structured Data Makes Feature Types Unique............................................................
partial: This Feature is Incomplete.........................................................................................
except: There is Something Biologically Exceptional..........................................................
comment: A Comment About This Feature..........................................................................
product: Does This Feature Produce Another Bioseq?........................................................
location: Source Location of This Feature..............................................................................
qual: GenBank Style Qualifiers................................................................................................
title: A User Defined Name......................................................................................................
ext: A User Defined Structured Extension..............................................................................
cit: Citations For This Feature.................................................................................................
exp-ev: Experimental Evidence...............................................................................................
xref: Linking To Other Features..............................................................................................
SeqFeatData: Type Specific Feature Data..........................................................................................
gene: Location Of A Gene.........................................................................................................
org: Source Organism Of The Bioseq......................................................................................
cdregion: Coding Region.........................................................................................................
prot: Describing A Protein.......................................................................................................
rna: Describing An RNA...........................................................................................................
pub: Publication About A Bioseq Region..............................................................................
seq: Tracking Original Sequence Sources..............................................................................
imp: Importing Features From Other Data Models.............................................................
region: A Named Region..........................................................................................................
comment: A Comment On A Region Of Sequence..............................................................
bond: A Bond Between Residues.............................................................................................
site: A Defined Site.....................................................................................................................
rsite: A Restriction Enzyme Cut Site......................................................................................
user: A User Defined Feature...................................................................................................
txinit: Transcription Initiation.................................................................................................
num: Applying Custom Numbering To A Region..............................................................
psec-str: Protein Secondary Structure.....................................................................................
non-std-residue: Unusual Residues.........................................................................................
het: Heterogen............................................................................................................................
Seq-feat Implementation in C...............................................................................................................
CdRegion: Coding Region.....................................................................................................................
orf: Open Reading Frame.........................................................................................................
Translation Information...........................................................................................................
Problems With Translations....................................................................................................
Genetic Codes...........................................................................................................................................
C Implementation Of Genetic Codes.....................................................................................
Rsite-ref: Reference To A Restriction Enzyme.................................................................................
RNA-ref: Reference To An RNA..........................................................................................................
Gene-ref: Reference To A Gene.............................................................................................................
Prot-ref: Reference To A Protein...........................................................................................................
Txinit: Transcription Initiation............................................................................................................
Current Genetic Code Table: gc.prt.....................................................................................................
ASN.1 Specification: seqfeat.asn.........................................................................................................
C Structures and Functions: objfeat.h................................................................................................
Sequence Alignments...........................................................................................................................................
Introduction...............................................................................................................................................
Seq-align.....................................................................................................................................................
type: global.................................................................................................................................
type: partial.................................................................................................................................
type: diags...................................................................................................................................
dim: Dimensionality Of The Alignment................................................................................
Score: Score Of An Alignment Or Segment.......................................................................................
Dense-diag: Segments For "diags" Seq-align...................................................................................
Dense-seg: Segments for "global" or "partial" Seq-align...............................................................
Std-seg: Aligning Any Bioseq Type With Any Other....................................................................
ASN.1 Specification: seqalign.asn......................................................................................................
C Structures and Functions: objalign.h.............................................................................................
Sequence Graphs...................................................................................................................................................
Introduction...............................................................................................................................................
Seq-graph: Graph on a Bioseq..............................................................................................................
ASN.1 Specification: seqres.asn..........................................................................................................
C Structures and Functions: objres.h.................................................................................................
Sequence Utilities...................................................................................................................................................
Introduction...............................................................................................................................................
Demo: seqtest.c..........................................................................................................................................
Finding Features and Descriptors in an Entry................................................................................
Exploring an Object Using ASN.1 Defined Names.......................................................................
C Structures and Functions: sequtil.h................................................................................................
C Structures and Functions: seqport.h..............................................................................................
Entrez Data Access.................................................................................................................................................
Introduction...............................................................................................................................................
Connecting To and Disconnecting From Data Sources................................................................
Scanning the List of Available Terms................................................................................................
Obtaining the UID Given an Accession Number...........................................................................
Obtaining the UIDs That Satisfy a Boolean Query........................................................................
Loading a Sequence Record..................................................................................................................
Loading a MEDLINE Record...............................................................................................................
Streaming Through All of the Data Records....................................................................................
Converting to FASTA Format...............................................................................................................
Converting GenBank Format................................................................................................................
Converting to MEDLARS Format........................................................................................................
Loading a Document Summary...........................................................................................................
Loading a Set of Document Summaries............................................................................................
Retreiving Neighbors and Links.........................................................................................................
C Structures and Functions: accentr.h...............................................................................................
C Structures and Functions: casn.h....................................................................................................
Vibrant User Interface Tools..............................................................................................................................
Introduction...............................................................................................................................................
Programming Example..........................................................................................................................
Object Specification....................................................................................................................
Callback Functions....................................................................................................................
Reference....................................................................................................................................................
Object Data Types.......................................................................................................................
Callback Types...........................................................................................................................
General Global Variables.........................................................................................................
Window Objects.........................................................................................................................
Context Functions......................................................................................................................
Grouping Objects.......................................................................................................................
Button Objects.............................................................................................................................
List Objects..................................................................................................................................
Menu Objects...............................................................................................................................
Popup Object...............................................................................................................................
Prompt Object.............................................................................................................................
Text Objects.................................................................................................................................
Scroll Bar Object.........................................................................................................................
Slate and Panel Objects..............................................................................................................
Repeat Object..............................................................................................................................
Switch Object...............................................................................................................................
Icon Object...................................................................................................................................
Graphical Viewer Object...........................................................................................................
Doc Object....................................................................................................................................
Class Functions...........................................................................................................................
Miscellaneous Functions...........................................................................................................
Graphical Drawing Functions..................................................................................................
Index............................................................................................................................................................
Acknowledgments...................................................................................................................................
Trademarks................................................................................................................................................
Introduction
Components Of The Software Development ToolKit
A Few Samples
Using This Document
Contacting NCBI
Molecular biology is generating a host of data which are dramatically altering and deepening our understanding of the processes which underlie all living things. This new knowledge is already affecting medicine, agriculture, biotechnology, and basic science in fundamental and sweeping ways. However, the data on which our growing understanding is based is being accumulated and analyzed in thousands of laboratories all over the world, from large genome centers to small university laboratories, from large pharmaceutical companies to small biotech startups. It is being managed and analyzed on machines from small personal computers to supercomputers, on systems from a few disk files to large commercial database systems. These essential new data require specialized tools for analysis and management, so software tools are being developed in all these different environments at once. Since molecular biology is an infant science, the data itself is not yet fully understood, so its fundamental properties and relationships are constantly being revised as well. Finally, the raw volume of molecular biology data is growing at an astonishing rate.
In recognition of the essential and growing role of bioinformatics in the United States, the National Center for Biotechnology Information (NCBI) was created by act of Congress in November 1988. This law mandates that NCBI shall:
1) Create automated systems for knowledge about molecular biology, biochemistry, and genetics.
2) Perform research into advanced methods of analyzing and interpreting molecular biology data.
3) Enable biotechnology researchers and medical care personnel to use the systems and methods developed.
4) Coordinate efforts to gather biotechnology information worldwide.
To approach these goals, NCBI has been organized into three interoperating branches. The Basic Research Branch (BRB) is a group of scientists who perform research into algorithms and methods for analyzing molecular biology data and publish results in peer reviewed journals, and keeps the other branches abreast of the latest developments from a scientific perspective. The Information Resources Branch (IRB) maintains the infrastructure at NCBI, administers the distribution of data and services provided by NCBI to the community, supports a visiting scientist program to enable researchers to spend time working at NCBI, and interacts with other agencies and bodies. The Information Engineering Branch (IEB) designs and builds databases and software tools for molecular biology information which attempt to incorporate the new approaches and meet the needs of the BRB, while producing data and software tools which are released to the community on a production basis by the IRB.
This document describes the data model and software tools developed by the IEB to achieve their mission. The IEB approaches its task with an understanding of the situation outlined in the first paragraph, that molecular biology data comes from and is used in an extremely heterogeneous, distributed, and changing environment, from both computing and biological points of view. The data processed and integrated by IEB will come from many different sources which may use different models of the data, which can be expected to change over time. The data will be stored and managed on many different computer systems using many different database management systems. The data itself is expected to be valuable for longer than the life cycle of any particular computer system or program. This means that the data must be described in a controlled and formal way, so that all participants can clearly understand what data components are available in common at any time, but without dependence on any particular software tool or language, database management system, or hardware architecture.
Software developed by IEB must be capable of running on all major hardware platforms used in the scientific community and must be designed to be ported to new systems as the computer industry progresses. It must be capable of providing systems for data retrieval by end-user scientists while also providing software hooks for other programs written by bioinformatics specialists in commercial, academic, or government settings, and by academic researchers.
To achieve the goal of a formal, controlled, yet flexible data specification, IEB has adopted the use of Abstract Syntax Notation 1 (ASN.1), and International Standards Organization standard (ISO 8824, 8825) for describing and encoding data in a machine readable way which is independent of hardware or software architecture and language. IEB has created a formal specification in ASN.1 for biotechnology and bibliographic information. This specification is based on a data model which unifies sequence related data from bands on a gel to genetic maps to sequenced nucleic acid and protein molecules. It provides connections from such data to other specialized datasets such as stock center lists, taxonomies, or structures. The specification is done as a series of connected modules. This means selected modules can be reused by other biotechnology databases and new ones added to meet specialized needs. The ASN.1 specification and encoding provide an essential common ground, changing the many to many mapping between the various information sources and applications to a many to one mapping, both for data models and for software interfaces.
To achieve the goals of software portability and of providing different levels of access from database producer to programmer to end-user, IEB has developed a layered software toolkit. The toolkit is used internally at NCBI to process and analyze data from a variety of sources to build and maintain the unified databases and also serves as the components for the end-user applications NCBI distributes. This means it is subjected to the continuous demands for quality and performance imposed by a large, production operation in the course of our daily work. The source code for the toolkit is made available without restriction for use by anyone wishing to take advantage of the work done by NCBI. The software runs on a wide variety of common platforms and is layered to allow programmers use both very low level or very high level tools to access and manipulate data.
A brief introduction is provided to the ASN.1 language itself in the beginning of the AsnLib chapter. Those familiar with Backus-Naur form should have no trouble reading it immediately, while a short explanation may be required for others. It is a simple, logical way to specify data and is used for many purposes in the computer industry to describe and exchange data. A number of books, articles, and software tools from the computer industry at large are readily available for those who wish a more in-depth knowledge of ASN.1. This is an important aspect of choosing the ASN.1 language to describe biological data. ASN.1 is a formal data description language, developed, tested, and used within the computer industry, not an ad hoc file format developed by biologists. Would you program in an ad hoc programming language developed by biologists? Then why describe data that way?
The selection of a data description language does not define what it is used for any more than the selection of English defines what a book is about. The IEB has defined a model for biotechnology information (which happens to be specified in ASN.1) which is centered around the concept of a biological sequence as a simple, linear coordinate system. Genetic and physical maps, sequenced pieces of nucleic acids and proteins, and complex assemblies of such components can all be considered specializations of the basic sequence concept of an identified coordinate system. Relationships between sequences (e.g. sequence alignments, sequence assemblies, relationships of genetic to physical maps) can all be considered mappings from one sequence coordinate system to another. Information about sequences can be considered mappings of specialized data objects (e.g. publications, genes, coding regions) to any sequence coordinate system. Such specialized data objects may themselves contain keys to other databases containing more specialized information not necessarily captured by the common data model, but unique to a particular organism, discipline, or database.
The CoreLib is a small set of "C' language functions, macros, and guidelines that permit the writing of programs which compile and execute without change on over fourteen different hardware/operating system/compiler combinations. If one wishes to distribute one's code to as many molecular biologists as possible with as little work as possible, learning to write CoreLib style code is a tremendous advantage. If one wishes to write on one platform, but interface with NCBI software, one should still understand the CoreLib approach (read the introduction in the CoreLib chapter), but it does not require that one write CoreLib code oneself.
AsnLib is a function library written with CoreLib, which provides functions for reading and validating ASN.1 specifications and generating parse trees to encoded and decode data conforming to the specification. The parse trees can be generating dynamically at run-time from any input specification, or parse trees for particular specifications can be produced as "C" language header files to be incorporated into applications. Given a parse tree generated either way, AsnLib provides low level functions for encoding and decoding data in either the text or binary forms of ASN.1, one element at a time. Converters to other languages (ASN.1 to Prolog or ASN.1 to LISP have been done), filters (get all journal titles from an ASN.1 encoded stream of bibliographic citations), or indexing programs (index a file of ASN.1 encoded bibliographic citations on author name) can be written with tools at this level.
Every ASN.1 specification module in the NCBI data model has a corressponding "object loader" module. This is a "C" language ".c" and ".h" file which typedef a "C" structure for every entity defined in ASN.1 (called an "object" here). For each object there is a function to create it, read it from an ASN.1 stream, write it to an ASN.1 stream, and free it. These take the form of [AsnName]New(),