NCBI C Toolkit Cross Reference

C/asn/insdseq.asn


  1 --$Revision: 1.7 $
  2 --************************************************************************
  3 --
  4 -- ASN.1 and XML for the components of a GenBank/EMBL/DDBJ sequence record
  5 -- The International Nucleotide Sequence Database (INSD) collaboration
  6 -- Version 1.5, 15 January 2009
  7 --
  8 --************************************************************************
  9 
 10 INSD-INSDSeq DEFINITIONS ::=
 11 BEGIN
 12 
 13 --  INSDSeq provides the elements of a sequence as presented in the
 14 --    GenBank/EMBL/DDBJ-style flatfile formats, with a small amount of
 15 --    additional structure.
 16 --    Although this single perspective of the three flatfile formats
 17 --    provides a useful simplification, it hides to some extent the
 18 --    details of the actual data underlying those formats. Nevertheless,
 19 --    the XML version of INSD-Seq is being provided with
 20 --    the hopes that it will prove useful to those who bulk-process
 21 --    sequence data at the flatfile-format level of detail. Further 
 22 --    documentation regarding the content and conventions of those formats 
 23 --    can be found at:
 24 --
 25 --    URLs for the DDBJ, EMBL, and GenBank Feature Table Document:
 26 --    http://www.ddbj.nig.ac.jp/FT/full_index.html
 27 --    http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
 28 --    http://www.ncbi.nlm.nih.gov/projects/collab/FT/index.html
 29 --
 30 --    URLs for DDBJ, EMBL, and GenBank Release Notes :
 31 --    ftp://ftp.ddbj.nig.ac.jp/database/ddbj/ddbjrel.txt
 32 --    http://www.ebi.ac.uk/embl/Documentation/Release_notes/current/relnotes.html
 33 --    ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt
 34 --
 35 --    Because INSDSeq is a compromise, a number of pragmatic decisions have
 36 --    been made:
 37 --
 38 --  In pursuit of simplicity and familiarity a number of fields do not
 39 --    have full substructure defined here where there is already a
 40 --    standard flatfile format string. For example:
 41 --
 42 --   Dates:      DD-MON-YYYY (eg 10-JUN-2003)
 43 --
 44 --   Author:     LastName, Initials  (eg Smith, J.N.)
 45 --            or Lastname Initials   (eg Smith J.N.)
 46 --
 47 --   Journal:    JournalName Volume (issue), page-range (year)
 48 --            or JournalName Volume(issue):page-range(year)
 49 --            eg Appl. Environ. Microbiol. 61 (4), 1646-1648 (1995)
 50 --               Appl. Environ. Microbiol. 61(4):1646-1648(1995).
 51 --
 52 --  FeatureLocations are representated as in the flatfile feature table,
 53 --    but FeatureIntervals may also be provided as a convenience
 54 --
 55 --  FeatureQualifiers are represented as in the flatfile feature table.
 56 --
 57 --  Primary has a string that represents a table to construct
 58 --    a third party (TPA) sequence.
 59 --
 60 --  other-seqids can have strings with the "vertical bar format" sequence
 61 --    identifiers used in BLAST for example, when they are non-INSD types.
 62 --
 63 --  Currently in flatfile format you only see Accession numbers, but there 
 64 --    are others, like patents, submitter clone names, etc which will 
 65 --    appear here
 66 --
 67 --  There are also a number of elements that could have been more exactly
 68 --    specified, but in the interest of simplicity have been simply left as
 69 --    optional. For example:
 70 --
 71 --  All publicly accessible sequence records in INSDSeq format will
 72 --    include accession and accession.version. However, these elements are 
 73 --    optional in optional in INSDSeq so that this format can also be used   
 74 --    for non-public sequence data, prior to the assignment of accessions and 
 75 --    version numbers. In such cases, records will have only "other-seqids".
 76 --
 77 --  sequences will normally all have "sequence" filled in. But contig records
 78 --    will have a "join" statement in the "contig" slot, and no "sequence".
 79 --    We also may consider a retrieval option with no sequence of any kind
 80 --    and no feature table to quickly check minimal values.
 81 --
 82 --  Four (optional) elements are specific to records represented via the EMBL
 83 --    sequence database: INSDSeq_update-release, INSDSeq_create-release,
 84 --    INSDSeq_entry-version, and INSDSeq_database-reference.
 85 --
 86 --  One (optional) element is specific to records originating at the GenBank
 87 --    and DDBJ sequence databases: INSDSeq_segment.
 88 --
 89 --********
 90 
 91 INSDSet ::= SEQUENCE OF INSDSeq
 92 
 93 INSDSeq ::= SEQUENCE {
 94     locus VisibleString ,
 95     length INTEGER ,
 96     strandedness VisibleString OPTIONAL ,
 97     moltype VisibleString ,
 98     topology VisibleString OPTIONAL ,
 99     division VisibleString ,
100     update-date VisibleString ,
101     create-date VisibleString OPTIONAL ,
102     update-release VisibleString OPTIONAL ,
103     create-release VisibleString OPTIONAL ,
104     definition VisibleString ,
105     primary-accession VisibleString OPTIONAL ,
106     entry-version VisibleString OPTIONAL ,
107     accession-version VisibleString OPTIONAL ,
108     other-seqids SEQUENCE OF INSDSeqid OPTIONAL ,
109     secondary-accessions SEQUENCE OF INSDSecondary-accn OPTIONAL,
110     project VisibleString OPTIONAL ,
111     keywords SEQUENCE OF INSDKeyword OPTIONAL ,
112     segment VisibleString OPTIONAL ,
113     source VisibleString OPTIONAL ,
114     organism VisibleString OPTIONAL ,
115     taxonomy VisibleString OPTIONAL ,
116     references SEQUENCE OF INSDReference OPTIONAL ,
117     comment VisibleString OPTIONAL ,
118     tagset INSDTagset OPTIONAL ,
119     primary VisibleString OPTIONAL ,
120     source-db VisibleString OPTIONAL ,
121     database-reference VisibleString OPTIONAL ,
122     feature-table SEQUENCE OF INSDFeature OPTIONAL ,
123     sequence VisibleString OPTIONAL ,  -- Optional for other dump forms
124     contig VisibleString OPTIONAL
125 }
126 
127 INSDSeqid ::= VisibleString
128 
129 INSDSecondary-accn ::= VisibleString
130 
131 INSDKeyword ::= VisibleString
132 
133 -- INSDReference_position contains a string value indicating the
134 -- basepair span(s) to which a reference applies. The allowable
135 -- formats are:
136 -- 
137 --   X..Y  : Where X and Y are integers separated by two periods,
138 --           X >= 1 , Y <= sequence length, and X <= Y 
139 --
140 --           Multiple basepair spans can exist, separated by a
141 --           semi-colon and a space. For example : 10..20; 100..500
142 --             
143 --   sites : The string literal 'sites', indicating that a reference
144 --           provides sequence annotation information, but the specific
145 --           basepair spans are either not captured, or were too numerous
146 --           to record.
147 -- 
148 --           The 'sites' literal string is singly occuring, and
149 --            cannot be used in conjunction with any X..Y basepair spans.
150 -- 
151 --   References that lack an INSDReference_position element apply
152 --   to the entire sequence.
153 
154 INSDAuthor ::= VisibleString
155 
156 INSDReference ::= SEQUENCE {
157     reference VisibleString ,
158     position VisibleString OPTIONAL ,
159     authors SEQUENCE OF INSDAuthor OPTIONAL ,
160     consortium VisibleString OPTIONAL ,
161     title VisibleString OPTIONAL ,
162     journal VisibleString ,
163     xref SET OF INSDXref OPTIONAL ,
164     pubmed INTEGER OPTIONAL ,
165     remark VisibleString OPTIONAL
166 }
167 
168 -- INSDXref provides a method for referring to records in
169 -- other databases. INSDXref_dbname is a string value that
170 -- provides the name of the database, and INSDXref_dbname
171 -- is a string value that provides the record's identifier
172 -- in that database.
173 
174 INSDXref ::= SEQUENCE {
175     dbname VisibleString ,
176     id VisibleString
177 }
178 
179 -- INSDTagset is used for community-specific data elements
180 -- in a tag/value format.
181 
182 INSDTagset ::= SEQUENCE {
183     authority VisibleString OPTIONAL ,
184     version VisibleString OPTIONAL ,
185     url VisibleString OPTIONAL ,
186     tags INSDTags OPTIONAL
187 }
188 
189 INSDTags ::= SEQUENCE OF INSDTag
190 
191 INSDTag ::= SEQUENCE {
192     name VisibleString OPTIONAL ,
193     value VisibleString OPTIONAL ,
194     unit VisibleString OPTIONAL
195 }
196 
197 -- INSDFeature_operator contains a string value describing
198 -- the relationship among a set of INSDInterval within
199 -- INSDFeature_intervals. The allowable formats are:
200 -- 
201 --   join :  The string literal 'join' indicates that the
202 --           INSDInterval intervals are biologically joined
203 --           together into a contiguous molecule.
204 -- 
205 --   order : The string literal 'order' indicates that the
206 --           INSDInterval intervals are in the presented
207 --           order, but they are not necessarily contiguous.
208 -- 
209 --   Either 'join' or 'order' is required if INSDFeature_intervals
210 --   is comprised of more than one INSDInterval .
211 
212 INSDFeature ::= SEQUENCE {
213     key VisibleString ,
214     location VisibleString ,
215     intervals SEQUENCE OF INSDInterval OPTIONAL ,
216     operator VisibleString OPTIONAL ,
217     partial5 BOOLEAN OPTIONAL ,
218     partial3 BOOLEAN OPTIONAL ,
219     quals SEQUENCE OF INSDQualifier OPTIONAL
220 }
221 
222 -- INSDInterval_iscomp is a boolean indicating whether
223 -- an INSDInterval_from / INSDInterval_to location
224 -- represents a location on the complement strand.
225 -- When INSDInterval_iscomp is TRUE, it essentially
226 -- confirms that a 'from' value which is greater than
227 -- a 'to' value is intentional, because the location
228 -- is on the opposite strand of the presented sequence.
229 
230 -- INSDInterval_interbp is a boolean indicating whether
231 -- a feature (such as a restriction site) is located
232 -- between two adjacent basepairs. When INSDInterval_iscomp
233 -- is TRUE, the 'from' and 'to' values must differ by
234 -- exactly one base.
235 
236 INSDInterval ::= SEQUENCE {
237     from INTEGER OPTIONAL ,
238     to INTEGER OPTIONAL ,
239     point INTEGER OPTIONAL ,
240     iscomp BOOLEAN OPTIONAL ,
241     interbp BOOLEAN OPTIONAL ,
242     accession VisibleString
243 }
244 
245 INSDQualifier ::= SEQUENCE {
246     name VisibleString ,
247     value VisibleString OPTIONAL
248 }
249 
250 -- INSDTagsetRules defines mandatory, optional, and unique tags
251 -- for a given community's INSDTagset. If the tagset is extensible,
252 -- then additional tags which are not included in the list of
253 -- mandatory or optional tags may be present. The uniquetags
254 -- element provides a list of the tags that may occur only once
255 -- in a given tagset.
256 
257 INSDTagsetRules ::= SEQUENCE {
258     authority VisibleString OPTIONAL ,
259     version VisibleString OPTIONAL ,
260     mandatorytags INSDTagNames OPTIONAL ,
261     optionaltags INSDTagNames OPTIONAL ,
262     uniquetags INSDTagNames OPTIONAL ,
263     extensible BOOLEAN OPTIONAL
264 }
265 
266 INSDTagNames ::= SEQUENCE OF VisibleString
267 
268 INSDTagsetRuleSet ::= SEQUENCE OF INSDTagsetRules
269 
270 END
271 

source navigation ]   [ diff markup ]   [ identifier search ]   [ freetext search ]   [ file search ]  

This page was automatically generated by the LXR engine.
Visit the LXR main site for more information.