NCBI C Toolkit Cross Reference

C/biostruc/mmdb2.asn


  1 --$Revision: 6.0 $
  2 --**********************************************************************
  3 --
  4 --  Biological Macromolecule 3-D Structure Data Types for MMDB,
  5 --                A Molecular Modeling Database
  6 --
  7 --  Definitions for structural models
  8 --
  9 --  By Hitomi Ohkawa, Jim Ostell, Chris Hogue and Steve Bryant 
 10 --
 11 --  National Center for Biotechnology Information
 12 --  National Institutes of Health
 13 --  Bethesda, MD 20894 USA
 14 --
 15 --  July, 1996
 16 --
 17 --**********************************************************************
 18 
 19 MMDB-Structural-model DEFINITIONS ::=
 20 
 21 BEGIN
 22 
 23 EXPORTS Biostruc-model, Model-id, Model-coordinate-set-id;
 24 
 25 IMPORTS Chem-graph-pntrs, Atom-pntrs, Chem-graph-alignment,
 26         Sphere, Cone, Cylinder, Brick, Transform FROM MMDB-Features
 27         Biostruc-id FROM MMDB
 28         Pub FROM NCBI-Pub;
 29 
 30 -- A structural model maps chemical components into a measured three-
 31 -- dimensional space. PDB-derived biostrucs generally contain 4 models, 
 32 -- corresponding to "views" of the structure of a biomolecular assemble with 
 33 -- increasing levels of complexity.  Model types indicate the complexity of the
 34 -- view.  
 35 
 36 -- The model named "NCBI all atom" represents a view suitable for most 
 37 -- computational biology applications.  It provides complete atomic coordinate 
 38 -- data for a "single best" model, omitting statistical disorder information 
 39 -- and/or ensemble structure descriptions provided in the source PDB file.  
 40 -- Construction of the single best model is based on the assumption that the 
 41 -- contents of the "alternate conformation" field from pdb imply no correlation
 42 -- among the occupancies of multiple sites assigned to sets of atoms: the best 
 43 -- site is chosen only on the basis of highest occupancy. Note, however, that 
 44 -- alternate conformation sets where correlation is implied are generally 
 45 -- constrained in crystallographic refinement to have uniform occupancy, and 
 46 -- will thus be selected as a set. For ensemble models the model which assigns 
 47 -- coordinates to the most atoms is chosen.  If numbers of coordinates are the 
 48 -- same, the model occurring first in the PDB file is selected.  The single 
 49 -- best model includes complete coordinates for all nonpolymer components, but 
 50 -- omits those classified as "solvent".  Model type is 3 for this model. 
 51 
 52 -- The model named "NCBI backbone" represents a simple view intended for 
 53 -- graphic displays and rapid transmission over a network.  It includes only 
 54 -- alpha carbon or backbone phosphate coordinates for biopolymers. It is based 
 55 -- on selection of alpha-carbon and backbone phosphate atoms from the "NCBI
 56 -- all atom" model. The model type is set to 2.  An even simpler model gives 
 57 -- only a cartoon representation, using cylinders corresponding to secondary 
 58 -- structure elements.  This is named "NCBI vector", and has model type 1.
 59 
 60 -- The models named "PDB Model 1", "PDB Model 2", etc. represent the complete
 61 -- information provided by PDB, including full descriptions of statistical
 62 -- disorder.  The name of the model is based on the contents of the PDB MODEL
 63 -- record, with a default name of "PDB Model 1" for PDB files which contain 
 64 -- only a single model.  Construction of these models is based on the 
 65 -- assumption that contents of the PDB "alternate conformation" field are 
 66 -- intended to imply correlation among the occupancies of atom sets flagged by
 67 -- the same identifier.  The special flag " " (blank) is assumed to indicate 
 68 -- sites occupied in all alternate conformations, and sites flagged otherwise,
 69 -- together with " ", to indicate a distinct member of an ensemble of 
 70 -- alternate conformations.  Note that construction of ensemble members 
 71 -- according to these assumption requires two validation checks on PDB 
 72 -- "alternate conformation" flags: they must be unique among sites assigned to 
 73 -- the same atom, and that the special " " flag must occur only for unique
 74 -- sites.  Sites which violate the first check are flagged as "u", for 
 75 -- "unknown"; they are omitted from all ensemble definitions but are 
 76 -- nontheless retained in the coordinate list.  Sites which violate the second
 77 -- check are flagged "b" for "blank", and are included in an appropriately
 78 -- named ensemble.  The model type for pdb all models is 4.
 79 
 80 -- Note that in the MMDB database models are stored in the ASN.1 stream in
 81 -- order of increasing model type value.  Since models occur as the last item
 82 -- in a biostruc, parsers may avoid reading the entire stream if the desired
 83 -- model is one of the simplified types, which occur first in the stream. This
 84 -- can save considerable I/O time, particularly for large ensemble models from 
 85 -- NMR determinations.
 86 
 87 Biostruc-model ::= SEQUENCE {
 88         id                      Model-id,
 89         type                    Model-type,
 90         descr                   SEQUENCE OF Model-descr OPTIONAL,
 91         model-space             Model-space OPTIONAL,
 92         model-coordinates       SEQUENCE OF Model-coordinate-set OPTIONAL }
 93 
 94 Model-id ::= INTEGER
 95 
 96 Model-type ::= INTEGER {
 97         ncbi-vector(1),
 98         ncbi-backbone(2),
 99         ncbi-all-atom(3),
100         pdb-model(4),
101         other(255)}
102 
103 Model-descr ::= CHOICE {
104         name                    VisibleString,
105         pdb-reso                VisibleString,
106         pdb-method              VisibleString,
107         pdb-comment             VisibleString,
108         other-comment           VisibleString,
109         attribution             Pub }
110 
111 -- The model space defines measurement units and any external reference frame.
112 -- Coordinates refer to a right-handed orthogonal system defined on axes 
113 -- tagged x, y and z in the coordinate and feature definitions of a biostruc.
114 -- Coordinates from PDB-derived structures are reported without change, in
115 -- angstrom units.  The units of temperature and occupancy factors are not
116 -- defined explicitly in PDB, but are inferred from their value range.
117 
118 Model-space ::= SEQUENCE {
119         coordinate-units        ENUMERATED {
120                                         angstroms(1),
121                                         nanometers(2),
122                                         other(3),
123                                         unknown(255)},
124         thermal-factor-units    ENUMERATED {
125                                         b(1),
126                                         u(2),
127                                         other(3),
128                                         unknown(255)} OPTIONAL,
129         occupancy-factor-units  ENUMERATED {
130                                         fractional(1),
131                                         electrons(2),
132                                         other(3),
133                                         unknown(255)} OPTIONAL,
134         density-units           ENUMERATED {
135                                         electrons-per-unit-volume(1),
136                                         arbitrary-scale(2),
137                                         other(3),
138                                         unknown(255)} OPTIONAL,
139         reference-frame         Reference-frame OPTIONAL }
140 
141 -- An external reference frame is a pointer to another biostruc, with an 
142 -- optional operator to rotate and translate coordinates into its model space.
143 -- This item is intended for representation of homology-derived model 
144 -- structures, and is not present for structures from PDB.
145 
146 Reference-frame ::= SEQUENCE {
147         biostruc-id             Biostruc-id,
148         rotation-translation    Transform OPTIONAL }
149 
150 -- Atomic coordinates may be assigned literally or by reference to another
151 -- biostruc.  The reference coordinate type is used to represent homology-
152 -- derived model structures.  PDB-derived structures have literal coordinates.
153 
154 -- Referenced coordinates identify another biostruc, any transformation to be 
155 -- applied to coordinates from that biostruc, and a mapping of the chemical
156 -- graph of the present biostruc onto that of the referenced biostruc.  They
157 -- give an "alignment" of atoms in the current biostruc with those in another,
158 -- from which the coordinates of matched atoms may be retrieved.  For non-
159 -- atomic models "alignment" may also be represented by molecule and residue
160 -- equivalence lists.  Referenced coordinates are a data item inteded for 
161 -- representation of homology models, with an explicit pointer to their source
162 -- information. They do not occur in PDB-derived models.
163 
164 Model-coordinate-set ::= SEQUENCE {
165         id                      Model-coordinate-set-id OPTIONAL,
166         descr                   SEQUENCE OF Model-descr OPTIONAL,
167         coordinates             CHOICE {
168                 literal                 Coordinates,
169                 reference               Chem-graph-alignment } }
170         
171 Model-coordinate-set-id ::= INTEGER
172 
173 
174 -- Literal coordinates map chemical components into the model space.  Three 
175 -- mapping types are allowed, atomic coordinate models, density-grid models,
176 -- and surface models. A model consists of a sequence of such coordinate sets, 
177 -- and may thus combine coordinate subsets which have a different source.  
178 -- PDB-derived models contain a single atomic coordinate set, as they by
179 -- definition represent information from a single source.
180 
181 Coordinates ::= CHOICE {                
182         atomic                  Atomic-coordinates,
183         surface                 Surface-coordinates,
184         density                 Density-coordinates }
185 
186 -- Literal atomic coordinate values give location, occupancy and order
187 -- parameters, and a pointer to a specific atom defined in the biostruc graph.
188 -- Temperature and occupancy factors have their conventional crystallographic
189 -- definitions, with units defined in the model space declaration.  Atoms,
190 -- sites, temperature-factors, occupancies and alternate-conformation-ids
191 -- are parallel arrays, i.e. the have the same number of values as given by
192 -- number-of-points. Conformation ensembles represent distinct correlated-
193 -- disorder subsets of the coordinates.  They will be present only for certain 
194 -- "views" of PDB structures, as described above. Their derivation from PDB-
195 -- supplied "alternate-conformation" ids is described below. 
196 
197 Atomic-coordinates ::= SEQUENCE {
198         number-of-points        INTEGER,
199         atoms                   Atom-pntrs,
200         sites                   Model-space-points,
201         temperature-factors     Atomic-temperature-factors OPTIONAL,
202         occupancies             Atomic-occupancies OPTIONAL, 
203         alternate-conf-ids      Alternate-conformation-ids OPTIONAL,
204         conf-ensembles          SEQUENCE OF Conformation-ensemble OPTIONAL }
205 
206 -- The atoms whose location is described by each coordinate are identified
207 -- via a hierarchical pointer to the chemical graph of the biomolecular
208 -- assembly.  Coordinates may be matched with atoms in the chemical structure
209 -- by the values of the molecule, residue and atom id's given here,  which 
210 -- match exactly the items of the same type defined in Biostruc-graph.
211 
212 -- Coordinates are given as integer values, with a scale factor to convert 
213 -- to real values for each x, y or z, in the units indicated in model-space.
214 -- Integer values must be divided by the the scale factor.  This use of integer
215 -- values reduces the ASN.1 stream size. The scale factors for temperature 
216 -- factors and occupancies are given separately, but must be used in the same 
217 -- fashion to produce properly scaled real values.
218 
219 Model-space-points ::= SEQUENCE {
220         scale-factor            INTEGER,
221         x                       SEQUENCE OF INTEGER,    
222         y                       SEQUENCE OF INTEGER,
223         z                       SEQUENCE OF INTEGER } 
224 
225 Atomic-temperature-factors ::= CHOICE {
226         isotropic               Isotropic-temperature-factors,
227         anisotropic             Anisotropic-temperature-factors }
228 
229 Isotropic-temperature-factors ::= SEQUENCE {
230         scale-factor            INTEGER,
231         b                       SEQUENCE OF INTEGER }
232 
233 Anisotropic-temperature-factors ::= SEQUENCE {
234         scale-factor            INTEGER,
235         b-11                    SEQUENCE OF INTEGER,
236         b-12                    SEQUENCE OF INTEGER,
237         b-13                    SEQUENCE OF INTEGER,
238         b-22                    SEQUENCE OF INTEGER,
239         b-23                    SEQUENCE OF INTEGER,
240         b-33                    SEQUENCE OF INTEGER }
241 
242 Atomic-occupancies ::= SEQUENCE {
243         scale-factor            INTEGER,
244         o                       SEQUENCE OF INTEGER }
245 
246 -- An alternate conformation id is optionally associated with each coordinate. 
247 -- Aside from corrections due to the validation checks described above, the 
248 -- contents of MMDB Alternate-conformation-ids are identical to the PDB 
249 -- "alternate conformation" field.
250 
251 Alternate-conformation-ids ::= SEQUENCE OF Alternate-conformation-id 
252 
253 Alternate-conformation-id ::= VisibleString 
254 
255 -- Correlated disorder ensemble is defined by a set of alternate conformation 
256 -- id's which identify coordinates relevant to that ensemble. These are 
257 -- defined from the validated and corrected contents of the PDB "alternate
258 -- conformation" field as described above.  A given ensemble, for example, may
259 -- consist of atom sites flagged by " " and "A" Alternate-conformation-ids. 
260 -- Names for ensembles are constructed from these flags. This example would be
261 -- named, in its description, "PDB Ensemble blank plus A".
262 
263 -- Note that this interpretation is consistent with common PDB usage of the 
264 -- "alternate conformation" field, but that PDB specifications do not formally
265 -- distinguish between correlated and uncorrelated disorder in crystallographic
266 -- models. Ensembles identified in MMDB thus may not correspond to the meaning
267 -- intended by PDB or the depositor.  No information is lost, however, and
268 -- if the intended meaning is known alternative ensemble descriptions may be
269 -- reconstructed directly from the Alternate-conformation-ids.
270 
271 -- Note that correlated disorder as defined here is allowed within an atomic 
272 -- coordinate set but not between the multiple sets which may define a model. 
273 -- Multiple sets within the same model are intended as a means to represent 
274 -- assemblies modeled from different experimentally determined structures,
275 -- where correlated disorder between coordinate sets is not relevant.
276 
277 Conformation-ensemble ::= SEQUENCE {
278         name            VisibleString,
279         alt-conf-ids    SEQUENCE OF Alternate-conformation-id }
280 
281 
282 -- Literal surface coordinates define the chemical components whose structure
283 -- is described by a surface, and the surface itself.  The surface may be
284 -- either a regular geometric solid or a triangle-mesh of arbitrary shape.
285 
286 Surface-coordinates ::= SEQUENCE {
287         contents                Chem-graph-pntrs,
288         surface                 CHOICE {        sphere          Sphere,
289                                                 cone            Cone,
290                                                 cylinder        Cylinder,
291                                                 brick           Brick,
292                                                 tmesh           T-mesh,
293                                                 triangles       Triangles } }
294 T-mesh ::= SEQUENCE {
295         number-of-points        INTEGER,
296         scale-factor            INTEGER,
297         swap                    SEQUENCE OF BOOLEAN,
298         x                       SEQUENCE OF INTEGER,
299         y                       SEQUENCE OF INTEGER,
300         z                       SEQUENCE OF INTEGER }
301 
302 Triangles ::= SEQUENCE {
303         number-of-points        INTEGER,
304         scale-factor            INTEGER,
305         x                       SEQUENCE OF INTEGER,
306         y                       SEQUENCE OF INTEGER,
307         z                       SEQUENCE OF INTEGER,
308         number-of-triangles     INTEGER,
309         v1                      SEQUENCE OF INTEGER, 
310         v2                      SEQUENCE OF INTEGER,
311         v3                      SEQUENCE OF INTEGER }
312 
313 
314 -- Literal density coordinates define the chemical components whose structure
315 -- is described by a density grid, parameters of this grid, and density values.
316 
317 Density-coordinates ::= SEQUENCE {
318         contents                Chem-graph-pntrs,
319         grid-corners            Brick,
320         grid-steps-x            INTEGER,
321         grid-steps-y            INTEGER,
322         grid-steps-z            INTEGER,
323         fastest-varying         ENUMERATED {
324                                         x(1),
325                                         y(2),
326                                         z(3)},
327         slowest-varying         ENUMERATED {
328                                         x(1),
329                                         y(2),
330                                         z(3)},
331         scale-factor            INTEGER,
332         density                 SEQUENCE OF INTEGER }
333 
334 
335 END

source navigation ]   [ diff markup ]   [ identifier search ]   [ freetext search ]   [ file search ]  

This page was automatically generated by the LXR engine.
Visit the LXR main site for more information.