NCBI C Toolkit Cross Reference

C/asn/gbseq.asn


  1 --$Revision: 6.7 $
  2 --*********************************************************
  3 --
  4 -- ASN.1 and XML for the components of a GenBank format sequence
  5 -- J.Ostell 2002
  6 -- Updated 15 January 2009
  7 --
  8 --*********************************************************
  9 
 10 NCBI-GBSeq DEFINITIONS ::=
 11 BEGIN
 12 
 13 --********
 14 --  GBSeq represents the elements in a GenBank style report
 15 --    of a sequence with some small additions to structure and support
 16 --    for protein (GenPept) versions of GenBank format as seen in
 17 --    Entrez. While this represents the simplification, reduction of
 18 --    detail, and flattening to a single sequence perspective of GenBank
 19 --    format (compared with the full ASN.1 or XML from which GenBank and
 20 --    this format is derived at NCBI), it is presented in ASN.1 or XML for
 21 --    automated parsing and processing. It is hoped that this compromise
 22 --    will be useful for those bulk processing at the GenBank format level
 23 --    of detail today. Since it is a compromise, a number of pragmatic
 24 --    decisions have been made.
 25 --
 26 --  In pursuit of simplicity and familiarity a number of
 27 --    fields do not have full substructure defined here where there is
 28 --    already a standard GenBank format string. For example:
 29 --
 30 --    Date  DD-Mon-YYYY
 31 --    Authors   LastName, Intials (with periods)
 32 --   Journal   JounalName Volume (issue), page-range (year)
 33 --   FeatureLocations as per GenBank feature table, but FeatureIntervals
 34 --    may also be provided as a convenience
 35 --   FeatureQualifiers  as per GenBank feature table
 36 --   Primary has a string that represents a table to construct
 37 --    a third party (TPA) sequence.
 38 --   other-seqids can have strings with the "vertical bar format" sequence
 39 --    identifiers used in BLAST for example, when they are non-genbank types.
 40 --    Currently in GenBank format you only see GI, but there are others, like
 41 --    patents, submitter clone names, etc which will appear here, as they
 42 --    always have in the ASN.1 format, and full XML format.
 43 --   source-db is a formatted text block for peptides in GenPept format that
 44 --    carries information from the source protein database.
 45 --
 46 --  There are also a number of elements that could have been
 47 --   more exactly specified, but in the interest of simplicity
 48 --   have been simply left as options. For example..
 49 --
 50 --  accession and accession.version will always appear in a GenBank record
 51 --   they are optional because this format can also be used for non-GenBank
 52 --   sequences, and in that case will have only "other-seqids".
 53 --
 54 --  sequences will normally all have "sequence" filled in. But contig records
 55 --    will have a "join" statement in the "contig" slot, and no "sequence".
 56 --    We also may consider a retrieval option with no sequence of any kind
 57 --     and no feature table to quickly check minimal values.
 58 --
 59 --  a reference may have an author list, or be from a consortium, or both.
 60 --
 61 --  some fields, such as taxonomy, do appear as separate elements in GenBank
 62 --    format but without a specific linetype (in GenBank format this comes
 63 --    under ORGANISM). Another example is the separation of primary accession
 64 --    from the list of secondary accessions. In GenBank format primary
 65 --    accession is just the first one on the list that includes all secondaries
 66 --    after it.
 67 --
 68 --  create-date deserves special comment. The date you see on the right hand
 69 --    side of the LOCUS line in GenBank format is actually the last date the
 70 --    the record was modified (or the update-date). The date the record was
 71 --    first submitted to GenBank appears in the first submission citation in
 72 --    the reference section. Internally in the databases and ASN.1 NCBI keeps
 73 --    the first date the record was released into the sequence database at
 74 --    NCBI as create-date. For records from EMBL, which supports create-date,
 75 --    it is the date provided by EMBL. For DDBJ records, which do not supply
 76 --    a create-date (same as GenBank format) the create-date is the first date
 77 --    NCBI saw the record from DDBJ. For older GenBank records, before NCBI
 78 --    took responsibility for GenBank, it is just the first date NCBI saw the
 79 --    record. Create-date can be very useful, so we expose it here, but users
 80 --    must understand it is only an approximation and comes from many sources,
 81 --    and with many exceptions and caveats. It does NOT tell you the first
 82 --    date the public might have seen this record and thus is NOT an accurate
 83 --    measure for legal issues of precedence.
 84 --
 85 --********
 86 
 87 GBSet ::= SEQUENCE OF GBSeq
 88         
 89 GBSeq ::= SEQUENCE {
 90     locus VisibleString ,
 91     length INTEGER ,
 92     strandedness VisibleString OPTIONAL ,
 93     moltype VisibleString ,
 94     topology VisibleString OPTIONAL ,
 95     division VisibleString ,
 96     update-date VisibleString ,
 97     create-date VisibleString OPTIONAL ,
 98     update-release VisibleString OPTIONAL ,
 99     create-release VisibleString OPTIONAL ,
100     definition VisibleString ,
101     primary-accession VisibleString OPTIONAL ,
102     entry-version VisibleString OPTIONAL ,
103     accession-version VisibleString OPTIONAL ,
104     other-seqids SEQUENCE OF GBSeqid OPTIONAL ,
105     secondary-accessions SEQUENCE OF GBSecondary-accn OPTIONAL,
106     project VisibleString OPTIONAL ,
107     keywords SEQUENCE OF GBKeyword OPTIONAL ,
108     segment VisibleString OPTIONAL ,
109     source VisibleString OPTIONAL ,
110     organism VisibleString OPTIONAL ,
111     taxonomy VisibleString OPTIONAL ,
112     references SEQUENCE OF GBReference OPTIONAL ,
113     comment VisibleString OPTIONAL ,
114     tagset GBTagset OPTIONAL ,
115     primary VisibleString OPTIONAL ,
116     source-db VisibleString OPTIONAL ,
117     database-reference VisibleString OPTIONAL ,
118     feature-table SEQUENCE OF GBFeature OPTIONAL ,
119     sequence VisibleString OPTIONAL ,  -- Optional for other dump forms
120     contig VisibleString OPTIONAL
121 }
122 
123 GBSecondary-accn ::= VisibleString
124 
125 GBSeqid ::= VisibleString
126 
127 GBKeyword ::= VisibleString
128 
129 GBAuthor ::= VisibleString
130 
131 GBReference ::= SEQUENCE {
132     reference VisibleString ,
133     position VisibleString OPTIONAL ,
134     authors SEQUENCE OF GBAuthor OPTIONAL ,
135     consortium VisibleString OPTIONAL ,
136     title VisibleString OPTIONAL ,
137     journal VisibleString ,
138     xref SET OF GBXref OPTIONAL ,
139     pubmed INTEGER OPTIONAL ,
140     remark VisibleString OPTIONAL
141 }
142 
143 GBXref ::= SEQUENCE {
144     dbname VisibleString ,
145     id VisibleString
146 }
147 
148 GBTagset ::= SEQUENCE {
149     authority VisibleString OPTIONAL ,
150     version VisibleString OPTIONAL ,
151     url VisibleString OPTIONAL ,
152     tags GBTags OPTIONAL
153 }
154 
155 GBTags ::= SEQUENCE OF GBTag
156 
157 GBTag ::= SEQUENCE {
158     name VisibleString OPTIONAL ,
159     value VisibleString OPTIONAL ,
160     unit VisibleString OPTIONAL
161 }
162 
163 GBFeature ::= SEQUENCE {
164     key VisibleString ,
165     location VisibleString ,
166     intervals SEQUENCE OF GBInterval OPTIONAL ,
167     operator VisibleString OPTIONAL ,
168     partial5 BOOLEAN OPTIONAL ,
169     partial3 BOOLEAN OPTIONAL ,
170     quals SEQUENCE OF GBQualifier OPTIONAL
171 }
172 
173 GBInterval ::= SEQUENCE {
174     from INTEGER OPTIONAL ,
175     to INTEGER OPTIONAL ,
176     point INTEGER OPTIONAL ,
177     iscomp BOOLEAN OPTIONAL ,
178     interbp BOOLEAN OPTIONAL ,
179     accession VisibleString
180 }
181 
182 GBQualifier ::= SEQUENCE {
183     name VisibleString ,
184     value VisibleString OPTIONAL
185 }
186 
187 GBTagsetRules ::= SEQUENCE {
188     authority VisibleString OPTIONAL ,
189     version VisibleString OPTIONAL ,
190     mandatorytags GBTagNames OPTIONAL ,
191     optionaltags GBTagNames OPTIONAL ,
192     uniquetags GBTagNames OPTIONAL ,
193     extensible BOOLEAN OPTIONAL
194 }
195 
196 GBTagNames ::= SEQUENCE OF VisibleString
197 
198 GBTagsetRuleSet ::= SEQUENCE OF GBTagsetRules
199 
200 END
201 

source navigation ]   [ diff markup ]   [ identifier search ]   [ freetext search ]   [ file search ]  

This page was automatically generated by the LXR engine.
Visit the LXR main site for more information.