NCBI C Toolkit Cross Reference

C/doc/gene2xml.txt


  1 GENE2XML CONVERTER PROGRAM
  2 
  3 gene2xml is a stand-alone program that converts Entrez Gene ASN.1 into XML.
  4 It is available for several computer platforms (Alpha, Linux, Macintosh,
  5 Solaris, and Windows) and is distributed in the asn1-converters area of the
  6 NCBI public ftp site. From asn1-converters, navigate into by_program and
  7 then gene2xml, and download and extract the appropriate file.
  8 
  9 Entrez Gene data are stored as compressed binary Entrezgene-Set ASN.1 files
 10 on the NCBI ftp site, and have the suffix .ags.gz. These are several-fold
 11 smaller than compressed XML files, resulting in a significant savings of
 12 disk storage and network bandwidth. Normal processing by gene2xml produces
 13 text XML files with the same name but with .xgs as the suffix.
 14 
 15 The command-line arguments to gene2xml are described below.
 16 
 17   -p  Path to Files [String]  Optional
 18 
 19 Use -p if you want to process a entire directory of files. In this case,
 20 gene2xml ignores the -i and -o arguments. Otherwise it takes -a as the
 21 single input file, regardless of suffix.
 22 
 23   -r  Path for Results [String]  Optional
 24 
 25 If -p is given but no -r results path is provided, results are written in
 26 the same directory as the input file. The -p argument recursively explores
 27 any subdirectories, so there can be multiple places where output is written.
 28 
 29   -i  Single Input File [File In]  Optional
 30     default = stdin
 31 
 32   -o  Single Output File [File Out]  Optional
 33     default = stdout
 34 
 35 If -p is not given, -i is used for the input file, and -o is used for the
 36 output file. Suffix conventions are ignored in this case.
 37 
 38   -b  File is Binary [T/F]  Optional
 39     default = F
 40 
 41   -c  File is Compressed [T/F]  Optional
 42     default = F
 43 
 44 On UNIX platforms you can decompress .ags.gz files on-the-fly by using both
 45 -b and -c. On the PC you will need to manually decompress into .ags files
 46 and then only use the -b flag.
 47 
 48   -t  Taxon ID to Filter [Integer]  Optional
 49     default = 0
 50 
 51 If you want to extract only records for a particular organism, pass the
 52 NCBI taxon database number with the -t argument.  For example
 53 
 54   gene2xml -i All_Mammalia.ags.gz -b -c -t 9685 -o cats.xgs
 55 
 56 will only send gene records for cats (taxonomy ID 9685) to the file
 57 cats.xgs.
 58 
 59   -l  Log Processing [T/F]  Optional
 60     default = F
 61 
 62 When you are processing an entire directory of files, passing -l on the
 63 command-line causes gene2xml to print the current file name as it
 64 progresses through the directory.
 65 
 66 The following arguments, -x, -y, and -z, are normally not used, and
 67 gene2xml will default to writing Entrezgene-Set XML, which is the normal
 68 situation.
 69 
 70   -x  Extract .ags -> text .agc [T/F]  Optional
 71     default = F
 72 
 73 To accommodate existing programs, the -x argument will convert .ags files
 74 to the catenated Entrezgene text ASN.1 files that were previously
 75 distributed.
 76 
 77   -y  Combine .agc -> text .ags (for testing) [T/F]  Optional
 78     default = F
 79 
 80   -z  Combine .agc -> binary .ags, then gzip [T/F]  Optional
 81     default = F
 82 
 83 NCBI uses gene2xml with the -y or -z arguments to process internal data
 84 into the compressed binary Entrezgene-Set ASN.1 files that are placed on
 85 the NCBI ftp site. It is not expected that anyone outside of NCBI would use
 86 these arguments.
 87 
 88 A sample record that illustrates the structure of Entrezgene-Set XML is
 89 shown below. Ellipses (...) are used where blocks of text have been removed
 90 for brevity in this documentation.
 91 
 92 <?xml version="1.0"?>
 93 <!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN"
 94 "http://www.ncbi.nlm.nih.gov/dtd/NCBI_Entrezgene.dtd">
 95 <Entrezgene-Set>
 96   <Entrezgene>
 97     <Entrezgene_track-info>
 98       <Gene-track>
 99         <Gene-track_geneid>2652</Gene-track_geneid>
100         <Gene-track_status value="live">0</Gene-track_status>
101         <Gene-track_create-date>
102           <Date>
103             <Date_std>
104               <Date-std>
105                 <Date-std_year>2003</Date-std_year>
106                 <Date-std_month>8</Date-std_month>
107                 <Date-std_day>28</Date-std_day>
108                 <Date-std_hour>20</Date-std_hour>
109                 <Date-std_minute>30</Date-std_minute>
110                 <Date-std_second>0</Date-std_second>
111               </Date-std>
112             </Date_std>
113           </Date>
114         </Gene-track_create-date>
115                 <Gene-track_update-date>
116                   <Date>
117                         <Date_std>
118                           <Date-std>
119                                 <Date-std_year>2005</Date-std_year>
120                                 <Date-std_month>4</Date-std_month>
121                                 <Date-std_day>27</Date-std_day>
122                                 <Date-std_hour>21</Date-std_hour>
123                                 <Date-std_minute>45</Date-std_minute>
124                                 <Date-std_second>0</Date-std_second>
125                           </Date-std>
126                         </Date_std>
127                   </Date>
128                 </Gene-track_update-date>
129       </Gene-track>
130     </Entrezgene_track-info>
131     <Entrezgene_type value="protein-coding">6</Entrezgene_type>
132     <Entrezgene_source>
133       <BioSource>
134         <BioSource_genome value="genomic">1</BioSource_genome>
135         <BioSource_origin value="natural">1</BioSource_origin>
136         <BioSource_org>
137           <Org-ref>
138             <Org-ref_taxname>Homo sapiens</Org-ref_taxname>
139             <Org-ref_common>human</Org-ref_common>
140             <Org-ref_db>
141               <Dbtag>
142                 <Dbtag_db>taxon</Dbtag_db>
143                 <Dbtag_tag>
144                   <Object-id>
145                     <Object-id_id>9606</Object-id_id>
146                   </Object-id>
147                 </Dbtag_tag>
148               </Dbtag>
149             </Org-ref_db>
150             <Org-ref_syn>
151               <Org-ref_syn_E>man</Org-ref_syn_E>
152             </Org-ref_syn>
153             <Org-ref_orgname>
154               <OrgName>
155                 <OrgName_name>
156                   <OrgName_name_binomial>
157                     <BinomialOrgName>
158                       <BinomialOrgName_genus>Homo</BinomialOrgName_genus>
159                       <BinomialOrgName_species>sapiens
160 </BinomialOrgName_species>
161                     </BinomialOrgName>
162                   </OrgName_name_binomial>
163                 </OrgName_name>
164                 <OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata;
165 Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates;
166 Catarrhini; Hominidae; Homo</OrgName_lineage>
167                 <OrgName_gcode>1</OrgName_gcode>
168                 <OrgName_mgcode>2</OrgName_mgcode>
169                 <OrgName_div>PRI</OrgName_div>
170               </OrgName>
171             </Org-ref_orgname>
172           </Org-ref>
173         </BioSource_org>
174         <BioSource_subtype>
175           <SubSource>
176             <SubSource_subtype value="chromosome">1</SubSource_subtype>
177             <SubSource_name>X</SubSource_name>
178           </SubSource>
179         </BioSource_subtype>
180       </BioSource>
181     </Entrezgene_source>
182     <Entrezgene_gene>
183       <Gene-ref>
184         <Gene-ref_locus>OPN1MW</Gene-ref_locus>
185         <Gene-ref_desc>opsin 1 (cone pigments), medium-wave-sensitive (color
186 blindness, deutan)</Gene-ref_desc>
187         <Gene-ref_maploc>Xq28</Gene-ref_maploc>
188         <Gene-ref_db>
189           <Dbtag>
190             <Dbtag_db>MIM</Dbtag_db>
191             <Dbtag_tag>
192               <Object-id>
193                 <Object-id_id>303800</Object-id_id>
194               </Object-id>
195             </Dbtag_tag>
196           </Dbtag>
197         </Gene-ref_db>
198         <Gene-ref_syn>
199           <Gene-ref_syn_E>CBD</Gene-ref_syn_E>
200           <Gene-ref_syn_E>DCB</Gene-ref_syn_E>
201           <Gene-ref_syn_E>GCP</Gene-ref_syn_E>
202           <Gene-ref_syn_E>CBBM</Gene-ref_syn_E>
203         </Gene-ref_syn>
204         <Gene-ref_locus-tag>HGNC:4206</Gene-ref_locus-tag>
205       </Gene-ref>
206     </Entrezgene_gene>
207     <Entrezgene_prot>
208       <Prot-ref>
209         <Prot-ref_name>
210           <Prot-ref_name_E>opsin 1 (cone pigments), medium-wave-sensitive
211 (color blindness, deutan)</Prot-ref_name_E>
212           <Prot-ref_name_E>green cone pigment</Prot-ref_name_E>
213         </Prot-ref_name>
214       </Prot-ref>
215     </Entrezgene_prot>
216     <Entrezgene_location>
217       <Maps>
218         <Maps_display-str>Xq28</Maps_display-str>
219         <Maps_method>
220           <Maps_method_map-type value="cyto"/>
221         </Maps_method>
222       </Maps>
223     </Entrezgene_location>
224     <Entrezgene_gene-source>
225       <Gene-source>
226         <Gene-source_src>LocusLink</Gene-source_src>
227         <Gene-source_src-int>2652</Gene-source_src-int>
228         <Gene-source_src-str2>2652</Gene-source_src-str2>
229       </Gene-source>
230     </Entrezgene_gene-source>
231     <Entrezgene_locus>
232       <Gene-commentary>
233         <Gene-commentary_type value="genomic">1</Gene-commentary_type>
234         <Gene-commentary_heading>Reference</Gene-commentary_heading>
235         <Gene-commentary_accession>NC_000023</Gene-commentary_accession>
236         <Gene-commentary_version>8</Gene-commentary_version>
237         <Gene-commentary_seqs>
238           <Seq-loc>
239             <Seq-loc_int>
240               <Seq-interval>
241                 <Seq-interval_from>152969013</Seq-interval_from>
242                 <Seq-interval_to>152982377</Seq-interval_to>
243                 <Seq-interval_strand>
244                   <Na-strand value="plus"/>
245                 </Seq-interval_strand>
246                 <Seq-interval_id>
247                   <Seq-id>
248                     <Seq-id_gi>51511752</Seq-id_gi>
249                   </Seq-id>
250                 </Seq-interval_id>
251               </Seq-interval>
252             </Seq-loc_int>
253           </Seq-loc>
254         </Gene-commentary_seqs>
255         <Gene-commentary_products>
256           <Gene-commentary>
257             <Gene-commentary_type value="mRNA">3</Gene-commentary_type>
258             <Gene-commentary_heading>Reference</Gene-commentary_heading>
259             <Gene-commentary_accession>NM_000513</Gene-commentary_accession>
260             <Gene-commentary_version>1</Gene-commentary_version>
261             <Gene-commentary_genomic-coords>
262               <Seq-loc>
263                 <Seq-loc_mix>
264                   <Seq-loc-mix>
265                     <Seq-loc>
266                       <Seq-loc_int>
267                         <Seq-interval>
268                           <Seq-interval_from>152969013</Seq-interval_from>
269                           <Seq-interval_to>152969124</Seq-interval_to>
270                           <Seq-interval_strand>
271                             <Na-strand value="plus"/>
272                           </Seq-interval_strand>
273                           <Seq-interval_id>
274                             <Seq-id>
275                               <Seq-id_gi>51511752</Seq-id_gi>
276                             </Seq-id>
277                           </Seq-interval_id>
278                         </Seq-interval>
279                       </Seq-loc_int>
280                     </Seq-loc>
281                     ...
282                   </Seq-loc-mix>
283                 </Seq-loc_mix>
284               </Seq-loc>
285             </Gene-commentary_genomic-coords>
286             <Gene-commentary_seqs>
287               <Seq-loc>
288                 <Seq-loc_whole>
289                   <Seq-id>
290                     <Seq-id_gi>4503964</Seq-id_gi>
291                   </Seq-id>
292                 </Seq-loc_whole>
293               </Seq-loc>
294             </Gene-commentary_seqs>
295             <Gene-commentary_products>
296               <Gene-commentary>
297                 <Gene-commentary_type value="peptide">8</Gene-commentary_type>
298                 <Gene-commentary_heading>Reference</Gene-commentary_heading>
299                 <Gene-commentary_accession>NP_000504
300 </Gene-commentary_accession>
301                 <Gene-commentary_version>1</Gene-commentary_version>
302                 <Gene-commentary_genomic-coords>
303                   <Seq-loc>
304                     <Seq-loc_packed-int>
305                       <Packed-seqint>
306                         <Seq-interval>
307                           <Seq-interval_from>152969013</Seq-interval_from>
308                           <Seq-interval_to>152969124</Seq-interval_to>
309                           <Seq-interval_strand>
310                             <Na-strand value="plus"/>
311                           </Seq-interval_strand>
312                           <Seq-interval_id>
313                             <Seq-id>
314                               <Seq-id_gi>51511752</Seq-id_gi>
315                             </Seq-id>
316                           </Seq-interval_id>
317                         </Seq-interval>
318                         ...
319                       </Packed-seqint>
320                     </Seq-loc_packed-int>
321                   </Seq-loc>
322                 </Gene-commentary_genomic-coords>
323                 <Gene-commentary_seqs>
324                   <Seq-loc>
325                     <Seq-loc_whole>
326                       <Seq-id>
327                         <Seq-id_gi>4503965</Seq-id_gi>
328                       </Seq-id>
329                     </Seq-loc_whole>
330                   </Seq-loc>
331                 </Gene-commentary_seqs>
332               </Gene-commentary>
333             </Gene-commentary_products>
334           </Gene-commentary>
335         </Gene-commentary_products>
336       </Gene-commentary>
337       ...
338     </Entrezgene_locus>
339     <Entrezgene_properties>
340       <Gene-commentary>
341         <Gene-commentary_type value="comment">254</Gene-commentary_type>
342         <Gene-commentary_label>Nomenclature</Gene-commentary_label>
343         <Gene-commentary_source>
344           <Other-source>
345             <Other-source_anchor>HUGO Gene Nomenclature Committee
346 </Other-source_anchor>
347           </Other-source>
348         </Gene-commentary_source>
349         <Gene-commentary_properties>
350           <Gene-commentary>
351             <Gene-commentary_type value="property">16</Gene-commentary_type>
352             <Gene-commentary_label>Official Symbol</Gene-commentary_label>
353             <Gene-commentary_text>OPN1MW</Gene-commentary_text>
354           </Gene-commentary>
355           <Gene-commentary>
356             <Gene-commentary_type value="property">16</Gene-commentary_type>
357             <Gene-commentary_label>Official Full Name</Gene-commentary_label>
358             <Gene-commentary_text>opsin 1 (cone pigments),
359 medium-wave-sensitive (color blindness, deutan)</Gene-commentary_text>
360           </Gene-commentary>
361         </Gene-commentary_properties>
362       </Gene-commentary>
363       ...
364     </Entrezgene_properties>
365     <Entrezgene_comments>
366       <Gene-commentary>
367         <Gene-commentary_type value="comment">254</Gene-commentary_type>
368         <Gene-commentary_heading>LocusTagLink</Gene-commentary_heading>
369         <Gene-commentary_source>
370           <Other-source>
371             <Other-source_src>
372               <Dbtag>
373                 <Dbtag_db>HGNC</Dbtag_db>
374                 <Dbtag_tag>
375                   <Object-id>
376                     <Object-id_id>4206</Object-id_id>
377                   </Object-id>
378                 </Dbtag_tag>
379               </Dbtag>
380             </Other-source_src>
381           </Other-source>
382         </Gene-commentary_source>
383       </Gene-commentary>
384       ...
385     </Entrezgene_comments>
386     <Entrezgene_unique-keys>
387       <Dbtag>
388         <Dbtag_db>LocusID</Dbtag_db>
389         <Dbtag_tag>
390           <Object-id>
391             <Object-id_id>2652</Object-id_id>
392           </Object-id>
393         </Dbtag_tag>
394       </Dbtag>
395       <Dbtag>
396         <Dbtag_db>MIM</Dbtag_db>
397         <Dbtag_tag>
398           <Object-id>
399             <Object-id_id>303800</Object-id_id>
400           </Object-id>
401         </Dbtag_tag>
402       </Dbtag>
403     </Entrezgene_unique-keys>
404     <Entrezgene_xtra-index-terms>
405       <Entrezgene_xtra-index-terms_E>LOC2652</Entrezgene_xtra-index-terms_E>
406     </Entrezgene_xtra-index-terms>
407     <Entrezgene_xtra-properties>
408       <Xtra-Terms>
409         <Xtra-Terms_tag>PROP</Xtra-Terms_tag>
410         <Xtra-Terms_value>phenotype</Xtra-Terms_value>
411       </Xtra-Terms>
412     </Entrezgene_xtra-properties>
413   </Entrezgene>
414 </Entrezgene-Set>
415 

source navigation ]   [ diff markup ]   [ identifier search ]   [ freetext search ]   [ file search ]  

This page was automatically generated by the LXR engine.
Visit the LXR main site for more information.