|
NCBI Home IEB Home C Toolkit docs C++ Toolkit source browser C Toolkit source browser (2) |
NCBI C Toolkit Cross ReferenceC/asn/insdseq.asn |
source navigation diff markup identifier search freetext search file search |
1 --$Revision: 1.7 $
2 --************************************************************************
3 --
4 -- ASN.1 and XML for the components of a GenBank/EMBL/DDBJ sequence record
5 -- The International Nucleotide Sequence Database (INSD) collaboration
6 -- Version 1.5, 15 January 2009
7 --
8 --************************************************************************
9
10 INSD-INSDSeq DEFINITIONS ::=
11 BEGIN
12
13 -- INSDSeq provides the elements of a sequence as presented in the
14 -- GenBank/EMBL/DDBJ-style flatfile formats, with a small amount of
15 -- additional structure.
16 -- Although this single perspective of the three flatfile formats
17 -- provides a useful simplification, it hides to some extent the
18 -- details of the actual data underlying those formats. Nevertheless,
19 -- the XML version of INSD-Seq is being provided with
20 -- the hopes that it will prove useful to those who bulk-process
21 -- sequence data at the flatfile-format level of detail. Further
22 -- documentation regarding the content and conventions of those formats
23 -- can be found at:
24 --
25 -- URLs for the DDBJ, EMBL, and GenBank Feature Table Document:
26 -- http://www.ddbj.nig.ac.jp/FT/full_index.html
27 -- http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
28 -- http://www.ncbi.nlm.nih.gov/projects/collab/FT/index.html
29 --
30 -- URLs for DDBJ, EMBL, and GenBank Release Notes :
31 -- ftp://ftp.ddbj.nig.ac.jp/database/ddbj/ddbjrel.txt
32 -- http://www.ebi.ac.uk/embl/Documentation/Release_notes/current/relnotes.html
33 -- ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt
34 --
35 -- Because INSDSeq is a compromise, a number of pragmatic decisions have
36 -- been made:
37 --
38 -- In pursuit of simplicity and familiarity a number of fields do not
39 -- have full substructure defined here where there is already a
40 -- standard flatfile format string. For example:
41 --
42 -- Dates: DD-MON-YYYY (eg 10-JUN-2003)
43 --
44 -- Author: LastName, Initials (eg Smith, J.N.)
45 -- or Lastname Initials (eg Smith J.N.)
46 --
47 -- Journal: JournalName Volume (issue), page-range (year)
48 -- or JournalName Volume(issue):page-range(year)
49 -- eg Appl. Environ. Microbiol. 61 (4), 1646-1648 (1995)
50 -- Appl. Environ. Microbiol. 61(4):1646-1648(1995).
51 --
52 -- FeatureLocations are representated as in the flatfile feature table,
53 -- but FeatureIntervals may also be provided as a convenience
54 --
55 -- FeatureQualifiers are represented as in the flatfile feature table.
56 --
57 -- Primary has a string that represents a table to construct
58 -- a third party (TPA) sequence.
59 --
60 -- other-seqids can have strings with the "vertical bar format" sequence
61 -- identifiers used in BLAST for example, when they are non-INSD types.
62 --
63 -- Currently in flatfile format you only see Accession numbers, but there
64 -- are others, like patents, submitter clone names, etc which will
65 -- appear here
66 --
67 -- There are also a number of elements that could have been more exactly
68 -- specified, but in the interest of simplicity have been simply left as
69 -- optional. For example:
70 --
71 -- All publicly accessible sequence records in INSDSeq format will
72 -- include accession and accession.version. However, these elements are
73 -- optional in optional in INSDSeq so that this format can also be used
74 -- for non-public sequence data, prior to the assignment of accessions and
75 -- version numbers. In such cases, records will have only "other-seqids".
76 --
77 -- sequences will normally all have "sequence" filled in. But contig records
78 -- will have a "join" statement in the "contig" slot, and no "sequence".
79 -- We also may consider a retrieval option with no sequence of any kind
80 -- and no feature table to quickly check minimal values.
81 --
82 -- Four (optional) elements are specific to records represented via the EMBL
83 -- sequence database: INSDSeq_update-release, INSDSeq_create-release,
84 -- INSDSeq_entry-version, and INSDSeq_database-reference.
85 --
86 -- One (optional) element is specific to records originating at the GenBank
87 -- and DDBJ sequence databases: INSDSeq_segment.
88 --
89 --********
90
91 INSDSet ::= SEQUENCE OF INSDSeq
92
93 INSDSeq ::= SEQUENCE {
94 locus VisibleString ,
95 length INTEGER ,
96 strandedness VisibleString OPTIONAL ,
97 moltype VisibleString ,
98 topology VisibleString OPTIONAL ,
99 division VisibleString ,
100 update-date VisibleString ,
101 create-date VisibleString OPTIONAL ,
102 update-release VisibleString OPTIONAL ,
103 create-release VisibleString OPTIONAL ,
104 definition VisibleString ,
105 primary-accession VisibleString OPTIONAL ,
106 entry-version VisibleString OPTIONAL ,
107 accession-version VisibleString OPTIONAL ,
108 other-seqids SEQUENCE OF INSDSeqid OPTIONAL ,
109 secondary-accessions SEQUENCE OF INSDSecondary-accn OPTIONAL,
110 project VisibleString OPTIONAL ,
111 keywords SEQUENCE OF INSDKeyword OPTIONAL ,
112 segment VisibleString OPTIONAL ,
113 source VisibleString OPTIONAL ,
114 organism VisibleString OPTIONAL ,
115 taxonomy VisibleString OPTIONAL ,
116 references SEQUENCE OF INSDReference OPTIONAL ,
117 comment VisibleString OPTIONAL ,
118 tagset INSDTagset OPTIONAL ,
119 primary VisibleString OPTIONAL ,
120 source-db VisibleString OPTIONAL ,
121 database-reference VisibleString OPTIONAL ,
122 feature-table SEQUENCE OF INSDFeature OPTIONAL ,
123 sequence VisibleString OPTIONAL , -- Optional for other dump forms
124 contig VisibleString OPTIONAL
125 }
126
127 INSDSeqid ::= VisibleString
128
129 INSDSecondary-accn ::= VisibleString
130
131 INSDKeyword ::= VisibleString
132
133 -- INSDReference_position contains a string value indicating the
134 -- basepair span(s) to which a reference applies. The allowable
135 -- formats are:
136 --
137 -- X..Y : Where X and Y are integers separated by two periods,
138 -- X >= 1 , Y <= sequence length, and X <= Y
139 --
140 -- Multiple basepair spans can exist, separated by a
141 -- semi-colon and a space. For example : 10..20; 100..500
142 --
143 -- sites : The string literal 'sites', indicating that a reference
144 -- provides sequence annotation information, but the specific
145 -- basepair spans are either not captured, or were too numerous
146 -- to record.
147 --
148 -- The 'sites' literal string is singly occuring, and
149 -- cannot be used in conjunction with any X..Y basepair spans.
150 --
151 -- References that lack an INSDReference_position element apply
152 -- to the entire sequence.
153
154 INSDAuthor ::= VisibleString
155
156 INSDReference ::= SEQUENCE {
157 reference VisibleString ,
158 position VisibleString OPTIONAL ,
159 authors SEQUENCE OF INSDAuthor OPTIONAL ,
160 consortium VisibleString OPTIONAL ,
161 title VisibleString OPTIONAL ,
162 journal VisibleString ,
163 xref SET OF INSDXref OPTIONAL ,
164 pubmed INTEGER OPTIONAL ,
165 remark VisibleString OPTIONAL
166 }
167
168 -- INSDXref provides a method for referring to records in
169 -- other databases. INSDXref_dbname is a string value that
170 -- provides the name of the database, and INSDXref_dbname
171 -- is a string value that provides the record's identifier
172 -- in that database.
173
174 INSDXref ::= SEQUENCE {
175 dbname VisibleString ,
176 id VisibleString
177 }
178
179 -- INSDTagset is used for community-specific data elements
180 -- in a tag/value format.
181
182 INSDTagset ::= SEQUENCE {
183 authority VisibleString OPTIONAL ,
184 version VisibleString OPTIONAL ,
185 url VisibleString OPTIONAL ,
186 tags INSDTags OPTIONAL
187 }
188
189 INSDTags ::= SEQUENCE OF INSDTag
190
191 INSDTag ::= SEQUENCE {
192 name VisibleString OPTIONAL ,
193 value VisibleString OPTIONAL ,
194 unit VisibleString OPTIONAL
195 }
196
197 -- INSDFeature_operator contains a string value describing
198 -- the relationship among a set of INSDInterval within
199 -- INSDFeature_intervals. The allowable formats are:
200 --
201 -- join : The string literal 'join' indicates that the
202 -- INSDInterval intervals are biologically joined
203 -- together into a contiguous molecule.
204 --
205 -- order : The string literal 'order' indicates that the
206 -- INSDInterval intervals are in the presented
207 -- order, but they are not necessarily contiguous.
208 --
209 -- Either 'join' or 'order' is required if INSDFeature_intervals
210 -- is comprised of more than one INSDInterval .
211
212 INSDFeature ::= SEQUENCE {
213 key VisibleString ,
214 location VisibleString ,
215 intervals SEQUENCE OF INSDInterval OPTIONAL ,
216 operator VisibleString OPTIONAL ,
217 partial5 BOOLEAN OPTIONAL ,
218 partial3 BOOLEAN OPTIONAL ,
219 quals SEQUENCE OF INSDQualifier OPTIONAL
220 }
221
222 -- INSDInterval_iscomp is a boolean indicating whether
223 -- an INSDInterval_from / INSDInterval_to location
224 -- represents a location on the complement strand.
225 -- When INSDInterval_iscomp is TRUE, it essentially
226 -- confirms that a 'from' value which is greater than
227 -- a 'to' value is intentional, because the location
228 -- is on the opposite strand of the presented sequence.
229
230 -- INSDInterval_interbp is a boolean indicating whether
231 -- a feature (such as a restriction site) is located
232 -- between two adjacent basepairs. When INSDInterval_iscomp
233 -- is TRUE, the 'from' and 'to' values must differ by
234 -- exactly one base.
235
236 INSDInterval ::= SEQUENCE {
237 from INTEGER OPTIONAL ,
238 to INTEGER OPTIONAL ,
239 point INTEGER OPTIONAL ,
240 iscomp BOOLEAN OPTIONAL ,
241 interbp BOOLEAN OPTIONAL ,
242 accession VisibleString
243 }
244
245 INSDQualifier ::= SEQUENCE {
246 name VisibleString ,
247 value VisibleString OPTIONAL
248 }
249
250 -- INSDTagsetRules defines mandatory, optional, and unique tags
251 -- for a given community's INSDTagset. If the tagset is extensible,
252 -- then additional tags which are not included in the list of
253 -- mandatory or optional tags may be present. The uniquetags
254 -- element provides a list of the tags that may occur only once
255 -- in a given tagset.
256
257 INSDTagsetRules ::= SEQUENCE {
258 authority VisibleString OPTIONAL ,
259 version VisibleString OPTIONAL ,
260 mandatorytags INSDTagNames OPTIONAL ,
261 optionaltags INSDTagNames OPTIONAL ,
262 uniquetags INSDTagNames OPTIONAL ,
263 extensible BOOLEAN OPTIONAL
264 }
265
266 INSDTagNames ::= SEQUENCE OF VisibleString
267
268 INSDTagsetRuleSet ::= SEQUENCE OF INSDTagsetRules
269
270 END
271
|
This page was automatically generated by the
LXR engine.
Visit the LXR main site for more information. |