|
NCBI Home IEB Home C Toolkit docs C++ Toolkit source browser C Toolkit source browser (2) |
NCBI C Toolkit Cross ReferenceC/asn/gbseq.asn |
source navigation diff markup identifier search freetext search file search |
1 --$Revision: 6.7 $
2 --*********************************************************
3 --
4 -- ASN.1 and XML for the components of a GenBank format sequence
5 -- J.Ostell 2002
6 -- Updated 15 January 2009
7 --
8 --*********************************************************
9
10 NCBI-GBSeq DEFINITIONS ::=
11 BEGIN
12
13 --********
14 -- GBSeq represents the elements in a GenBank style report
15 -- of a sequence with some small additions to structure and support
16 -- for protein (GenPept) versions of GenBank format as seen in
17 -- Entrez. While this represents the simplification, reduction of
18 -- detail, and flattening to a single sequence perspective of GenBank
19 -- format (compared with the full ASN.1 or XML from which GenBank and
20 -- this format is derived at NCBI), it is presented in ASN.1 or XML for
21 -- automated parsing and processing. It is hoped that this compromise
22 -- will be useful for those bulk processing at the GenBank format level
23 -- of detail today. Since it is a compromise, a number of pragmatic
24 -- decisions have been made.
25 --
26 -- In pursuit of simplicity and familiarity a number of
27 -- fields do not have full substructure defined here where there is
28 -- already a standard GenBank format string. For example:
29 --
30 -- Date DD-Mon-YYYY
31 -- Authors LastName, Intials (with periods)
32 -- Journal JounalName Volume (issue), page-range (year)
33 -- FeatureLocations as per GenBank feature table, but FeatureIntervals
34 -- may also be provided as a convenience
35 -- FeatureQualifiers as per GenBank feature table
36 -- Primary has a string that represents a table to construct
37 -- a third party (TPA) sequence.
38 -- other-seqids can have strings with the "vertical bar format" sequence
39 -- identifiers used in BLAST for example, when they are non-genbank types.
40 -- Currently in GenBank format you only see GI, but there are others, like
41 -- patents, submitter clone names, etc which will appear here, as they
42 -- always have in the ASN.1 format, and full XML format.
43 -- source-db is a formatted text block for peptides in GenPept format that
44 -- carries information from the source protein database.
45 --
46 -- There are also a number of elements that could have been
47 -- more exactly specified, but in the interest of simplicity
48 -- have been simply left as options. For example..
49 --
50 -- accession and accession.version will always appear in a GenBank record
51 -- they are optional because this format can also be used for non-GenBank
52 -- sequences, and in that case will have only "other-seqids".
53 --
54 -- sequences will normally all have "sequence" filled in. But contig records
55 -- will have a "join" statement in the "contig" slot, and no "sequence".
56 -- We also may consider a retrieval option with no sequence of any kind
57 -- and no feature table to quickly check minimal values.
58 --
59 -- a reference may have an author list, or be from a consortium, or both.
60 --
61 -- some fields, such as taxonomy, do appear as separate elements in GenBank
62 -- format but without a specific linetype (in GenBank format this comes
63 -- under ORGANISM). Another example is the separation of primary accession
64 -- from the list of secondary accessions. In GenBank format primary
65 -- accession is just the first one on the list that includes all secondaries
66 -- after it.
67 --
68 -- create-date deserves special comment. The date you see on the right hand
69 -- side of the LOCUS line in GenBank format is actually the last date the
70 -- the record was modified (or the update-date). The date the record was
71 -- first submitted to GenBank appears in the first submission citation in
72 -- the reference section. Internally in the databases and ASN.1 NCBI keeps
73 -- the first date the record was released into the sequence database at
74 -- NCBI as create-date. For records from EMBL, which supports create-date,
75 -- it is the date provided by EMBL. For DDBJ records, which do not supply
76 -- a create-date (same as GenBank format) the create-date is the first date
77 -- NCBI saw the record from DDBJ. For older GenBank records, before NCBI
78 -- took responsibility for GenBank, it is just the first date NCBI saw the
79 -- record. Create-date can be very useful, so we expose it here, but users
80 -- must understand it is only an approximation and comes from many sources,
81 -- and with many exceptions and caveats. It does NOT tell you the first
82 -- date the public might have seen this record and thus is NOT an accurate
83 -- measure for legal issues of precedence.
84 --
85 --********
86
87 GBSet ::= SEQUENCE OF GBSeq
88
89 GBSeq ::= SEQUENCE {
90 locus VisibleString ,
91 length INTEGER ,
92 strandedness VisibleString OPTIONAL ,
93 moltype VisibleString ,
94 topology VisibleString OPTIONAL ,
95 division VisibleString ,
96 update-date VisibleString ,
97 create-date VisibleString OPTIONAL ,
98 update-release VisibleString OPTIONAL ,
99 create-release VisibleString OPTIONAL ,
100 definition VisibleString ,
101 primary-accession VisibleString OPTIONAL ,
102 entry-version VisibleString OPTIONAL ,
103 accession-version VisibleString OPTIONAL ,
104 other-seqids SEQUENCE OF GBSeqid OPTIONAL ,
105 secondary-accessions SEQUENCE OF GBSecondary-accn OPTIONAL,
106 project VisibleString OPTIONAL ,
107 keywords SEQUENCE OF GBKeyword OPTIONAL ,
108 segment VisibleString OPTIONAL ,
109 source VisibleString OPTIONAL ,
110 organism VisibleString OPTIONAL ,
111 taxonomy VisibleString OPTIONAL ,
112 references SEQUENCE OF GBReference OPTIONAL ,
113 comment VisibleString OPTIONAL ,
114 tagset GBTagset OPTIONAL ,
115 primary VisibleString OPTIONAL ,
116 source-db VisibleString OPTIONAL ,
117 database-reference VisibleString OPTIONAL ,
118 feature-table SEQUENCE OF GBFeature OPTIONAL ,
119 sequence VisibleString OPTIONAL , -- Optional for other dump forms
120 contig VisibleString OPTIONAL
121 }
122
123 GBSecondary-accn ::= VisibleString
124
125 GBSeqid ::= VisibleString
126
127 GBKeyword ::= VisibleString
128
129 GBAuthor ::= VisibleString
130
131 GBReference ::= SEQUENCE {
132 reference VisibleString ,
133 position VisibleString OPTIONAL ,
134 authors SEQUENCE OF GBAuthor OPTIONAL ,
135 consortium VisibleString OPTIONAL ,
136 title VisibleString OPTIONAL ,
137 journal VisibleString ,
138 xref SET OF GBXref OPTIONAL ,
139 pubmed INTEGER OPTIONAL ,
140 remark VisibleString OPTIONAL
141 }
142
143 GBXref ::= SEQUENCE {
144 dbname VisibleString ,
145 id VisibleString
146 }
147
148 GBTagset ::= SEQUENCE {
149 authority VisibleString OPTIONAL ,
150 version VisibleString OPTIONAL ,
151 url VisibleString OPTIONAL ,
152 tags GBTags OPTIONAL
153 }
154
155 GBTags ::= SEQUENCE OF GBTag
156
157 GBTag ::= SEQUENCE {
158 name VisibleString OPTIONAL ,
159 value VisibleString OPTIONAL ,
160 unit VisibleString OPTIONAL
161 }
162
163 GBFeature ::= SEQUENCE {
164 key VisibleString ,
165 location VisibleString ,
166 intervals SEQUENCE OF GBInterval OPTIONAL ,
167 operator VisibleString OPTIONAL ,
168 partial5 BOOLEAN OPTIONAL ,
169 partial3 BOOLEAN OPTIONAL ,
170 quals SEQUENCE OF GBQualifier OPTIONAL
171 }
172
173 GBInterval ::= SEQUENCE {
174 from INTEGER OPTIONAL ,
175 to INTEGER OPTIONAL ,
176 point INTEGER OPTIONAL ,
177 iscomp BOOLEAN OPTIONAL ,
178 interbp BOOLEAN OPTIONAL ,
179 accession VisibleString
180 }
181
182 GBQualifier ::= SEQUENCE {
183 name VisibleString ,
184 value VisibleString OPTIONAL
185 }
186
187 GBTagsetRules ::= SEQUENCE {
188 authority VisibleString OPTIONAL ,
189 version VisibleString OPTIONAL ,
190 mandatorytags GBTagNames OPTIONAL ,
191 optionaltags GBTagNames OPTIONAL ,
192 uniquetags GBTagNames OPTIONAL ,
193 extensible BOOLEAN OPTIONAL
194 }
195
196 GBTagNames ::= SEQUENCE OF VisibleString
197
198 GBTagsetRuleSet ::= SEQUENCE OF GBTagsetRules
199
200 END
201
|
This page was automatically generated by the
LXR engine.
Visit the LXR main site for more information. |