|
NCBI Home IEB Home C Toolkit docs C++ Toolkit source browser C Toolkit source browser (2) |
NCBI C Toolkit Cross ReferenceC/biostruc/mmdb2.asn |
source navigation diff markup identifier search freetext search file search |
1 --$Revision: 6.0 $
2 --**********************************************************************
3 --
4 -- Biological Macromolecule 3-D Structure Data Types for MMDB,
5 -- A Molecular Modeling Database
6 --
7 -- Definitions for structural models
8 --
9 -- By Hitomi Ohkawa, Jim Ostell, Chris Hogue and Steve Bryant
10 --
11 -- National Center for Biotechnology Information
12 -- National Institutes of Health
13 -- Bethesda, MD 20894 USA
14 --
15 -- July, 1996
16 --
17 --**********************************************************************
18
19 MMDB-Structural-model DEFINITIONS ::=
20
21 BEGIN
22
23 EXPORTS Biostruc-model, Model-id, Model-coordinate-set-id;
24
25 IMPORTS Chem-graph-pntrs, Atom-pntrs, Chem-graph-alignment,
26 Sphere, Cone, Cylinder, Brick, Transform FROM MMDB-Features
27 Biostruc-id FROM MMDB
28 Pub FROM NCBI-Pub;
29
30 -- A structural model maps chemical components into a measured three-
31 -- dimensional space. PDB-derived biostrucs generally contain 4 models,
32 -- corresponding to "views" of the structure of a biomolecular assemble with
33 -- increasing levels of complexity. Model types indicate the complexity of the
34 -- view.
35
36 -- The model named "NCBI all atom" represents a view suitable for most
37 -- computational biology applications. It provides complete atomic coordinate
38 -- data for a "single best" model, omitting statistical disorder information
39 -- and/or ensemble structure descriptions provided in the source PDB file.
40 -- Construction of the single best model is based on the assumption that the
41 -- contents of the "alternate conformation" field from pdb imply no correlation
42 -- among the occupancies of multiple sites assigned to sets of atoms: the best
43 -- site is chosen only on the basis of highest occupancy. Note, however, that
44 -- alternate conformation sets where correlation is implied are generally
45 -- constrained in crystallographic refinement to have uniform occupancy, and
46 -- will thus be selected as a set. For ensemble models the model which assigns
47 -- coordinates to the most atoms is chosen. If numbers of coordinates are the
48 -- same, the model occurring first in the PDB file is selected. The single
49 -- best model includes complete coordinates for all nonpolymer components, but
50 -- omits those classified as "solvent". Model type is 3 for this model.
51
52 -- The model named "NCBI backbone" represents a simple view intended for
53 -- graphic displays and rapid transmission over a network. It includes only
54 -- alpha carbon or backbone phosphate coordinates for biopolymers. It is based
55 -- on selection of alpha-carbon and backbone phosphate atoms from the "NCBI
56 -- all atom" model. The model type is set to 2. An even simpler model gives
57 -- only a cartoon representation, using cylinders corresponding to secondary
58 -- structure elements. This is named "NCBI vector", and has model type 1.
59
60 -- The models named "PDB Model 1", "PDB Model 2", etc. represent the complete
61 -- information provided by PDB, including full descriptions of statistical
62 -- disorder. The name of the model is based on the contents of the PDB MODEL
63 -- record, with a default name of "PDB Model 1" for PDB files which contain
64 -- only a single model. Construction of these models is based on the
65 -- assumption that contents of the PDB "alternate conformation" field are
66 -- intended to imply correlation among the occupancies of atom sets flagged by
67 -- the same identifier. The special flag " " (blank) is assumed to indicate
68 -- sites occupied in all alternate conformations, and sites flagged otherwise,
69 -- together with " ", to indicate a distinct member of an ensemble of
70 -- alternate conformations. Note that construction of ensemble members
71 -- according to these assumption requires two validation checks on PDB
72 -- "alternate conformation" flags: they must be unique among sites assigned to
73 -- the same atom, and that the special " " flag must occur only for unique
74 -- sites. Sites which violate the first check are flagged as "u", for
75 -- "unknown"; they are omitted from all ensemble definitions but are
76 -- nontheless retained in the coordinate list. Sites which violate the second
77 -- check are flagged "b" for "blank", and are included in an appropriately
78 -- named ensemble. The model type for pdb all models is 4.
79
80 -- Note that in the MMDB database models are stored in the ASN.1 stream in
81 -- order of increasing model type value. Since models occur as the last item
82 -- in a biostruc, parsers may avoid reading the entire stream if the desired
83 -- model is one of the simplified types, which occur first in the stream. This
84 -- can save considerable I/O time, particularly for large ensemble models from
85 -- NMR determinations.
86
87 Biostruc-model ::= SEQUENCE {
88 id Model-id,
89 type Model-type,
90 descr SEQUENCE OF Model-descr OPTIONAL,
91 model-space Model-space OPTIONAL,
92 model-coordinates SEQUENCE OF Model-coordinate-set OPTIONAL }
93
94 Model-id ::= INTEGER
95
96 Model-type ::= INTEGER {
97 ncbi-vector(1),
98 ncbi-backbone(2),
99 ncbi-all-atom(3),
100 pdb-model(4),
101 other(255)}
102
103 Model-descr ::= CHOICE {
104 name VisibleString,
105 pdb-reso VisibleString,
106 pdb-method VisibleString,
107 pdb-comment VisibleString,
108 other-comment VisibleString,
109 attribution Pub }
110
111 -- The model space defines measurement units and any external reference frame.
112 -- Coordinates refer to a right-handed orthogonal system defined on axes
113 -- tagged x, y and z in the coordinate and feature definitions of a biostruc.
114 -- Coordinates from PDB-derived structures are reported without change, in
115 -- angstrom units. The units of temperature and occupancy factors are not
116 -- defined explicitly in PDB, but are inferred from their value range.
117
118 Model-space ::= SEQUENCE {
119 coordinate-units ENUMERATED {
120 angstroms(1),
121 nanometers(2),
122 other(3),
123 unknown(255)},
124 thermal-factor-units ENUMERATED {
125 b(1),
126 u(2),
127 other(3),
128 unknown(255)} OPTIONAL,
129 occupancy-factor-units ENUMERATED {
130 fractional(1),
131 electrons(2),
132 other(3),
133 unknown(255)} OPTIONAL,
134 density-units ENUMERATED {
135 electrons-per-unit-volume(1),
136 arbitrary-scale(2),
137 other(3),
138 unknown(255)} OPTIONAL,
139 reference-frame Reference-frame OPTIONAL }
140
141 -- An external reference frame is a pointer to another biostruc, with an
142 -- optional operator to rotate and translate coordinates into its model space.
143 -- This item is intended for representation of homology-derived model
144 -- structures, and is not present for structures from PDB.
145
146 Reference-frame ::= SEQUENCE {
147 biostruc-id Biostruc-id,
148 rotation-translation Transform OPTIONAL }
149
150 -- Atomic coordinates may be assigned literally or by reference to another
151 -- biostruc. The reference coordinate type is used to represent homology-
152 -- derived model structures. PDB-derived structures have literal coordinates.
153
154 -- Referenced coordinates identify another biostruc, any transformation to be
155 -- applied to coordinates from that biostruc, and a mapping of the chemical
156 -- graph of the present biostruc onto that of the referenced biostruc. They
157 -- give an "alignment" of atoms in the current biostruc with those in another,
158 -- from which the coordinates of matched atoms may be retrieved. For non-
159 -- atomic models "alignment" may also be represented by molecule and residue
160 -- equivalence lists. Referenced coordinates are a data item inteded for
161 -- representation of homology models, with an explicit pointer to their source
162 -- information. They do not occur in PDB-derived models.
163
164 Model-coordinate-set ::= SEQUENCE {
165 id Model-coordinate-set-id OPTIONAL,
166 descr SEQUENCE OF Model-descr OPTIONAL,
167 coordinates CHOICE {
168 literal Coordinates,
169 reference Chem-graph-alignment } }
170
171 Model-coordinate-set-id ::= INTEGER
172
173
174 -- Literal coordinates map chemical components into the model space. Three
175 -- mapping types are allowed, atomic coordinate models, density-grid models,
176 -- and surface models. A model consists of a sequence of such coordinate sets,
177 -- and may thus combine coordinate subsets which have a different source.
178 -- PDB-derived models contain a single atomic coordinate set, as they by
179 -- definition represent information from a single source.
180
181 Coordinates ::= CHOICE {
182 atomic Atomic-coordinates,
183 surface Surface-coordinates,
184 density Density-coordinates }
185
186 -- Literal atomic coordinate values give location, occupancy and order
187 -- parameters, and a pointer to a specific atom defined in the biostruc graph.
188 -- Temperature and occupancy factors have their conventional crystallographic
189 -- definitions, with units defined in the model space declaration. Atoms,
190 -- sites, temperature-factors, occupancies and alternate-conformation-ids
191 -- are parallel arrays, i.e. the have the same number of values as given by
192 -- number-of-points. Conformation ensembles represent distinct correlated-
193 -- disorder subsets of the coordinates. They will be present only for certain
194 -- "views" of PDB structures, as described above. Their derivation from PDB-
195 -- supplied "alternate-conformation" ids is described below.
196
197 Atomic-coordinates ::= SEQUENCE {
198 number-of-points INTEGER,
199 atoms Atom-pntrs,
200 sites Model-space-points,
201 temperature-factors Atomic-temperature-factors OPTIONAL,
202 occupancies Atomic-occupancies OPTIONAL,
203 alternate-conf-ids Alternate-conformation-ids OPTIONAL,
204 conf-ensembles SEQUENCE OF Conformation-ensemble OPTIONAL }
205
206 -- The atoms whose location is described by each coordinate are identified
207 -- via a hierarchical pointer to the chemical graph of the biomolecular
208 -- assembly. Coordinates may be matched with atoms in the chemical structure
209 -- by the values of the molecule, residue and atom id's given here, which
210 -- match exactly the items of the same type defined in Biostruc-graph.
211
212 -- Coordinates are given as integer values, with a scale factor to convert
213 -- to real values for each x, y or z, in the units indicated in model-space.
214 -- Integer values must be divided by the the scale factor. This use of integer
215 -- values reduces the ASN.1 stream size. The scale factors for temperature
216 -- factors and occupancies are given separately, but must be used in the same
217 -- fashion to produce properly scaled real values.
218
219 Model-space-points ::= SEQUENCE {
220 scale-factor INTEGER,
221 x SEQUENCE OF INTEGER,
222 y SEQUENCE OF INTEGER,
223 z SEQUENCE OF INTEGER }
224
225 Atomic-temperature-factors ::= CHOICE {
226 isotropic Isotropic-temperature-factors,
227 anisotropic Anisotropic-temperature-factors }
228
229 Isotropic-temperature-factors ::= SEQUENCE {
230 scale-factor INTEGER,
231 b SEQUENCE OF INTEGER }
232
233 Anisotropic-temperature-factors ::= SEQUENCE {
234 scale-factor INTEGER,
235 b-11 SEQUENCE OF INTEGER,
236 b-12 SEQUENCE OF INTEGER,
237 b-13 SEQUENCE OF INTEGER,
238 b-22 SEQUENCE OF INTEGER,
239 b-23 SEQUENCE OF INTEGER,
240 b-33 SEQUENCE OF INTEGER }
241
242 Atomic-occupancies ::= SEQUENCE {
243 scale-factor INTEGER,
244 o SEQUENCE OF INTEGER }
245
246 -- An alternate conformation id is optionally associated with each coordinate.
247 -- Aside from corrections due to the validation checks described above, the
248 -- contents of MMDB Alternate-conformation-ids are identical to the PDB
249 -- "alternate conformation" field.
250
251 Alternate-conformation-ids ::= SEQUENCE OF Alternate-conformation-id
252
253 Alternate-conformation-id ::= VisibleString
254
255 -- Correlated disorder ensemble is defined by a set of alternate conformation
256 -- id's which identify coordinates relevant to that ensemble. These are
257 -- defined from the validated and corrected contents of the PDB "alternate
258 -- conformation" field as described above. A given ensemble, for example, may
259 -- consist of atom sites flagged by " " and "A" Alternate-conformation-ids.
260 -- Names for ensembles are constructed from these flags. This example would be
261 -- named, in its description, "PDB Ensemble blank plus A".
262
263 -- Note that this interpretation is consistent with common PDB usage of the
264 -- "alternate conformation" field, but that PDB specifications do not formally
265 -- distinguish between correlated and uncorrelated disorder in crystallographic
266 -- models. Ensembles identified in MMDB thus may not correspond to the meaning
267 -- intended by PDB or the depositor. No information is lost, however, and
268 -- if the intended meaning is known alternative ensemble descriptions may be
269 -- reconstructed directly from the Alternate-conformation-ids.
270
271 -- Note that correlated disorder as defined here is allowed within an atomic
272 -- coordinate set but not between the multiple sets which may define a model.
273 -- Multiple sets within the same model are intended as a means to represent
274 -- assemblies modeled from different experimentally determined structures,
275 -- where correlated disorder between coordinate sets is not relevant.
276
277 Conformation-ensemble ::= SEQUENCE {
278 name VisibleString,
279 alt-conf-ids SEQUENCE OF Alternate-conformation-id }
280
281
282 -- Literal surface coordinates define the chemical components whose structure
283 -- is described by a surface, and the surface itself. The surface may be
284 -- either a regular geometric solid or a triangle-mesh of arbitrary shape.
285
286 Surface-coordinates ::= SEQUENCE {
287 contents Chem-graph-pntrs,
288 surface CHOICE { sphere Sphere,
289 cone Cone,
290 cylinder Cylinder,
291 brick Brick,
292 tmesh T-mesh,
293 triangles Triangles } }
294 T-mesh ::= SEQUENCE {
295 number-of-points INTEGER,
296 scale-factor INTEGER,
297 swap SEQUENCE OF BOOLEAN,
298 x SEQUENCE OF INTEGER,
299 y SEQUENCE OF INTEGER,
300 z SEQUENCE OF INTEGER }
301
302 Triangles ::= SEQUENCE {
303 number-of-points INTEGER,
304 scale-factor INTEGER,
305 x SEQUENCE OF INTEGER,
306 y SEQUENCE OF INTEGER,
307 z SEQUENCE OF INTEGER,
308 number-of-triangles INTEGER,
309 v1 SEQUENCE OF INTEGER,
310 v2 SEQUENCE OF INTEGER,
311 v3 SEQUENCE OF INTEGER }
312
313
314 -- Literal density coordinates define the chemical components whose structure
315 -- is described by a density grid, parameters of this grid, and density values.
316
317 Density-coordinates ::= SEQUENCE {
318 contents Chem-graph-pntrs,
319 grid-corners Brick,
320 grid-steps-x INTEGER,
321 grid-steps-y INTEGER,
322 grid-steps-z INTEGER,
323 fastest-varying ENUMERATED {
324 x(1),
325 y(2),
326 z(3)},
327 slowest-varying ENUMERATED {
328 x(1),
329 y(2),
330 z(3)},
331 scale-factor INTEGER,
332 density SEQUENCE OF INTEGER }
333
334
335 END
|
This page was automatically generated by the
LXR engine.
Visit the LXR main site for more information. |