|
NCBI Home IEB Home C Toolkit docs C++ Toolkit source browser C Toolkit source browser (2) |
NCBI C Toolkit Cross ReferenceC/doc/gene2xml.txt |
source navigation diff markup identifier search freetext search file search |
1 GENE2XML CONVERTER PROGRAM 2 3 gene2xml is a stand-alone program that converts Entrez Gene ASN.1 into XML. 4 It is available for several computer platforms (Alpha, Linux, Macintosh, 5 Solaris, and Windows) and is distributed in the asn1-converters area of the 6 NCBI public ftp site. From asn1-converters, navigate into by_program and 7 then gene2xml, and download and extract the appropriate file. 8 9 Entrez Gene data are stored as compressed binary Entrezgene-Set ASN.1 files 10 on the NCBI ftp site, and have the suffix .ags.gz. These are several-fold 11 smaller than compressed XML files, resulting in a significant savings of 12 disk storage and network bandwidth. Normal processing by gene2xml produces 13 text XML files with the same name but with .xgs as the suffix. 14 15 The command-line arguments to gene2xml are described below. 16 17 -p Path to Files [String] Optional 18 19 Use -p if you want to process a entire directory of files. In this case, 20 gene2xml ignores the -i and -o arguments. Otherwise it takes -a as the 21 single input file, regardless of suffix. 22 23 -r Path for Results [String] Optional 24 25 If -p is given but no -r results path is provided, results are written in 26 the same directory as the input file. The -p argument recursively explores 27 any subdirectories, so there can be multiple places where output is written. 28 29 -i Single Input File [File In] Optional 30 default = stdin 31 32 -o Single Output File [File Out] Optional 33 default = stdout 34 35 If -p is not given, -i is used for the input file, and -o is used for the 36 output file. Suffix conventions are ignored in this case. 37 38 -b File is Binary [T/F] Optional 39 default = F 40 41 -c File is Compressed [T/F] Optional 42 default = F 43 44 On UNIX platforms you can decompress .ags.gz files on-the-fly by using both 45 -b and -c. On the PC you will need to manually decompress into .ags files 46 and then only use the -b flag. 47 48 -t Taxon ID to Filter [Integer] Optional 49 default = 0 50 51 If you want to extract only records for a particular organism, pass the 52 NCBI taxon database number with the -t argument. For example 53 54 gene2xml -i All_Mammalia.ags.gz -b -c -t 9685 -o cats.xgs 55 56 will only send gene records for cats (taxonomy ID 9685) to the file 57 cats.xgs. 58 59 -l Log Processing [T/F] Optional 60 default = F 61 62 When you are processing an entire directory of files, passing -l on the 63 command-line causes gene2xml to print the current file name as it 64 progresses through the directory. 65 66 The following arguments, -x, -y, and -z, are normally not used, and 67 gene2xml will default to writing Entrezgene-Set XML, which is the normal 68 situation. 69 70 -x Extract .ags -> text .agc [T/F] Optional 71 default = F 72 73 To accommodate existing programs, the -x argument will convert .ags files 74 to the catenated Entrezgene text ASN.1 files that were previously 75 distributed. 76 77 -y Combine .agc -> text .ags (for testing) [T/F] Optional 78 default = F 79 80 -z Combine .agc -> binary .ags, then gzip [T/F] Optional 81 default = F 82 83 NCBI uses gene2xml with the -y or -z arguments to process internal data 84 into the compressed binary Entrezgene-Set ASN.1 files that are placed on 85 the NCBI ftp site. It is not expected that anyone outside of NCBI would use 86 these arguments. 87 88 A sample record that illustrates the structure of Entrezgene-Set XML is 89 shown below. Ellipses (...) are used where blocks of text have been removed 90 for brevity in this documentation. 91 92 <?xml version="1.0"?> 93 <!DOCTYPE Entrezgene-Set PUBLIC "-//NCBI//NCBI Entrezgene/EN" 94 "http://www.ncbi.nlm.nih.gov/dtd/NCBI_Entrezgene.dtd"> 95 <Entrezgene-Set> 96 <Entrezgene> 97 <Entrezgene_track-info> 98 <Gene-track> 99 <Gene-track_geneid>2652</Gene-track_geneid> 100 <Gene-track_status value="live">0</Gene-track_status> 101 <Gene-track_create-date> 102 <Date> 103 <Date_std> 104 <Date-std> 105 <Date-std_year>2003</Date-std_year> 106 <Date-std_month>8</Date-std_month> 107 <Date-std_day>28</Date-std_day> 108 <Date-std_hour>20</Date-std_hour> 109 <Date-std_minute>30</Date-std_minute> 110 <Date-std_second>0</Date-std_second> 111 </Date-std> 112 </Date_std> 113 </Date> 114 </Gene-track_create-date> 115 <Gene-track_update-date> 116 <Date> 117 <Date_std> 118 <Date-std> 119 <Date-std_year>2005</Date-std_year> 120 <Date-std_month>4</Date-std_month> 121 <Date-std_day>27</Date-std_day> 122 <Date-std_hour>21</Date-std_hour> 123 <Date-std_minute>45</Date-std_minute> 124 <Date-std_second>0</Date-std_second> 125 </Date-std> 126 </Date_std> 127 </Date> 128 </Gene-track_update-date> 129 </Gene-track> 130 </Entrezgene_track-info> 131 <Entrezgene_type value="protein-coding">6</Entrezgene_type> 132 <Entrezgene_source> 133 <BioSource> 134 <BioSource_genome value="genomic">1</BioSource_genome> 135 <BioSource_origin value="natural">1</BioSource_origin> 136 <BioSource_org> 137 <Org-ref> 138 <Org-ref_taxname>Homo sapiens</Org-ref_taxname> 139 <Org-ref_common>human</Org-ref_common> 140 <Org-ref_db> 141 <Dbtag> 142 <Dbtag_db>taxon</Dbtag_db> 143 <Dbtag_tag> 144 <Object-id> 145 <Object-id_id>9606</Object-id_id> 146 </Object-id> 147 </Dbtag_tag> 148 </Dbtag> 149 </Org-ref_db> 150 <Org-ref_syn> 151 <Org-ref_syn_E>man</Org-ref_syn_E> 152 </Org-ref_syn> 153 <Org-ref_orgname> 154 <OrgName> 155 <OrgName_name> 156 <OrgName_name_binomial> 157 <BinomialOrgName> 158 <BinomialOrgName_genus>Homo</BinomialOrgName_genus> 159 <BinomialOrgName_species>sapiens 160 </BinomialOrgName_species> 161 </BinomialOrgName> 162 </OrgName_name_binomial> 163 </OrgName_name> 164 <OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata; 165 Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; 166 Catarrhini; Hominidae; Homo</OrgName_lineage> 167 <OrgName_gcode>1</OrgName_gcode> 168 <OrgName_mgcode>2</OrgName_mgcode> 169 <OrgName_div>PRI</OrgName_div> 170 </OrgName> 171 </Org-ref_orgname> 172 </Org-ref> 173 </BioSource_org> 174 <BioSource_subtype> 175 <SubSource> 176 <SubSource_subtype value="chromosome">1</SubSource_subtype> 177 <SubSource_name>X</SubSource_name> 178 </SubSource> 179 </BioSource_subtype> 180 </BioSource> 181 </Entrezgene_source> 182 <Entrezgene_gene> 183 <Gene-ref> 184 <Gene-ref_locus>OPN1MW</Gene-ref_locus> 185 <Gene-ref_desc>opsin 1 (cone pigments), medium-wave-sensitive (color 186 blindness, deutan)</Gene-ref_desc> 187 <Gene-ref_maploc>Xq28</Gene-ref_maploc> 188 <Gene-ref_db> 189 <Dbtag> 190 <Dbtag_db>MIM</Dbtag_db> 191 <Dbtag_tag> 192 <Object-id> 193 <Object-id_id>303800</Object-id_id> 194 </Object-id> 195 </Dbtag_tag> 196 </Dbtag> 197 </Gene-ref_db> 198 <Gene-ref_syn> 199 <Gene-ref_syn_E>CBD</Gene-ref_syn_E> 200 <Gene-ref_syn_E>DCB</Gene-ref_syn_E> 201 <Gene-ref_syn_E>GCP</Gene-ref_syn_E> 202 <Gene-ref_syn_E>CBBM</Gene-ref_syn_E> 203 </Gene-ref_syn> 204 <Gene-ref_locus-tag>HGNC:4206</Gene-ref_locus-tag> 205 </Gene-ref> 206 </Entrezgene_gene> 207 <Entrezgene_prot> 208 <Prot-ref> 209 <Prot-ref_name> 210 <Prot-ref_name_E>opsin 1 (cone pigments), medium-wave-sensitive 211 (color blindness, deutan)</Prot-ref_name_E> 212 <Prot-ref_name_E>green cone pigment</Prot-ref_name_E> 213 </Prot-ref_name> 214 </Prot-ref> 215 </Entrezgene_prot> 216 <Entrezgene_location> 217 <Maps> 218 <Maps_display-str>Xq28</Maps_display-str> 219 <Maps_method> 220 <Maps_method_map-type value="cyto"/> 221 </Maps_method> 222 </Maps> 223 </Entrezgene_location> 224 <Entrezgene_gene-source> 225 <Gene-source> 226 <Gene-source_src>LocusLink</Gene-source_src> 227 <Gene-source_src-int>2652</Gene-source_src-int> 228 <Gene-source_src-str2>2652</Gene-source_src-str2> 229 </Gene-source> 230 </Entrezgene_gene-source> 231 <Entrezgene_locus> 232 <Gene-commentary> 233 <Gene-commentary_type value="genomic">1</Gene-commentary_type> 234 <Gene-commentary_heading>Reference</Gene-commentary_heading> 235 <Gene-commentary_accession>NC_000023</Gene-commentary_accession> 236 <Gene-commentary_version>8</Gene-commentary_version> 237 <Gene-commentary_seqs> 238 <Seq-loc> 239 <Seq-loc_int> 240 <Seq-interval> 241 <Seq-interval_from>152969013</Seq-interval_from> 242 <Seq-interval_to>152982377</Seq-interval_to> 243 <Seq-interval_strand> 244 <Na-strand value="plus"/> 245 </Seq-interval_strand> 246 <Seq-interval_id> 247 <Seq-id> 248 <Seq-id_gi>51511752</Seq-id_gi> 249 </Seq-id> 250 </Seq-interval_id> 251 </Seq-interval> 252 </Seq-loc_int> 253 </Seq-loc> 254 </Gene-commentary_seqs> 255 <Gene-commentary_products> 256 <Gene-commentary> 257 <Gene-commentary_type value="mRNA">3</Gene-commentary_type> 258 <Gene-commentary_heading>Reference</Gene-commentary_heading> 259 <Gene-commentary_accession>NM_000513</Gene-commentary_accession> 260 <Gene-commentary_version>1</Gene-commentary_version> 261 <Gene-commentary_genomic-coords> 262 <Seq-loc> 263 <Seq-loc_mix> 264 <Seq-loc-mix> 265 <Seq-loc> 266 <Seq-loc_int> 267 <Seq-interval> 268 <Seq-interval_from>152969013</Seq-interval_from> 269 <Seq-interval_to>152969124</Seq-interval_to> 270 <Seq-interval_strand> 271 <Na-strand value="plus"/> 272 </Seq-interval_strand> 273 <Seq-interval_id> 274 <Seq-id> 275 <Seq-id_gi>51511752</Seq-id_gi> 276 </Seq-id> 277 </Seq-interval_id> 278 </Seq-interval> 279 </Seq-loc_int> 280 </Seq-loc> 281 ... 282 </Seq-loc-mix> 283 </Seq-loc_mix> 284 </Seq-loc> 285 </Gene-commentary_genomic-coords> 286 <Gene-commentary_seqs> 287 <Seq-loc> 288 <Seq-loc_whole> 289 <Seq-id> 290 <Seq-id_gi>4503964</Seq-id_gi> 291 </Seq-id> 292 </Seq-loc_whole> 293 </Seq-loc> 294 </Gene-commentary_seqs> 295 <Gene-commentary_products> 296 <Gene-commentary> 297 <Gene-commentary_type value="peptide">8</Gene-commentary_type> 298 <Gene-commentary_heading>Reference</Gene-commentary_heading> 299 <Gene-commentary_accession>NP_000504 300 </Gene-commentary_accession> 301 <Gene-commentary_version>1</Gene-commentary_version> 302 <Gene-commentary_genomic-coords> 303 <Seq-loc> 304 <Seq-loc_packed-int> 305 <Packed-seqint> 306 <Seq-interval> 307 <Seq-interval_from>152969013</Seq-interval_from> 308 <Seq-interval_to>152969124</Seq-interval_to> 309 <Seq-interval_strand> 310 <Na-strand value="plus"/> 311 </Seq-interval_strand> 312 <Seq-interval_id> 313 <Seq-id> 314 <Seq-id_gi>51511752</Seq-id_gi> 315 </Seq-id> 316 </Seq-interval_id> 317 </Seq-interval> 318 ... 319 </Packed-seqint> 320 </Seq-loc_packed-int> 321 </Seq-loc> 322 </Gene-commentary_genomic-coords> 323 <Gene-commentary_seqs> 324 <Seq-loc> 325 <Seq-loc_whole> 326 <Seq-id> 327 <Seq-id_gi>4503965</Seq-id_gi> 328 </Seq-id> 329 </Seq-loc_whole> 330 </Seq-loc> 331 </Gene-commentary_seqs> 332 </Gene-commentary> 333 </Gene-commentary_products> 334 </Gene-commentary> 335 </Gene-commentary_products> 336 </Gene-commentary> 337 ... 338 </Entrezgene_locus> 339 <Entrezgene_properties> 340 <Gene-commentary> 341 <Gene-commentary_type value="comment">254</Gene-commentary_type> 342 <Gene-commentary_label>Nomenclature</Gene-commentary_label> 343 <Gene-commentary_source> 344 <Other-source> 345 <Other-source_anchor>HUGO Gene Nomenclature Committee 346 </Other-source_anchor> 347 </Other-source> 348 </Gene-commentary_source> 349 <Gene-commentary_properties> 350 <Gene-commentary> 351 <Gene-commentary_type value="property">16</Gene-commentary_type> 352 <Gene-commentary_label>Official Symbol</Gene-commentary_label> 353 <Gene-commentary_text>OPN1MW</Gene-commentary_text> 354 </Gene-commentary> 355 <Gene-commentary> 356 <Gene-commentary_type value="property">16</Gene-commentary_type> 357 <Gene-commentary_label>Official Full Name</Gene-commentary_label> 358 <Gene-commentary_text>opsin 1 (cone pigments), 359 medium-wave-sensitive (color blindness, deutan)</Gene-commentary_text> 360 </Gene-commentary> 361 </Gene-commentary_properties> 362 </Gene-commentary> 363 ... 364 </Entrezgene_properties> 365 <Entrezgene_comments> 366 <Gene-commentary> 367 <Gene-commentary_type value="comment">254</Gene-commentary_type> 368 <Gene-commentary_heading>LocusTagLink</Gene-commentary_heading> 369 <Gene-commentary_source> 370 <Other-source> 371 <Other-source_src> 372 <Dbtag> 373 <Dbtag_db>HGNC</Dbtag_db> 374 <Dbtag_tag> 375 <Object-id> 376 <Object-id_id>4206</Object-id_id> 377 </Object-id> 378 </Dbtag_tag> 379 </Dbtag> 380 </Other-source_src> 381 </Other-source> 382 </Gene-commentary_source> 383 </Gene-commentary> 384 ... 385 </Entrezgene_comments> 386 <Entrezgene_unique-keys> 387 <Dbtag> 388 <Dbtag_db>LocusID</Dbtag_db> 389 <Dbtag_tag> 390 <Object-id> 391 <Object-id_id>2652</Object-id_id> 392 </Object-id> 393 </Dbtag_tag> 394 </Dbtag> 395 <Dbtag> 396 <Dbtag_db>MIM</Dbtag_db> 397 <Dbtag_tag> 398 <Object-id> 399 <Object-id_id>303800</Object-id_id> 400 </Object-id> 401 </Dbtag_tag> 402 </Dbtag> 403 </Entrezgene_unique-keys> 404 <Entrezgene_xtra-index-terms> 405 <Entrezgene_xtra-index-terms_E>LOC2652</Entrezgene_xtra-index-terms_E> 406 </Entrezgene_xtra-index-terms> 407 <Entrezgene_xtra-properties> 408 <Xtra-Terms> 409 <Xtra-Terms_tag>PROP</Xtra-Terms_tag> 410 <Xtra-Terms_value>phenotype</Xtra-Terms_value> 411 </Xtra-Terms> 412 </Entrezgene_xtra-properties> 413 </Entrezgene> 414 </Entrezgene-Set> 415
|
This page was automatically generated by the
LXR engine.
Visit the LXR main site for more information. |