|
NCBI Home IEB Home C Toolkit docs C++ Toolkit source browser C Toolkit source browser (2) |
NCBI C Toolkit Cross ReferenceC/doc/asn2gb.txt |
source navigation diff markup identifier search freetext search file search |
1 GENBANK FLATFILE GENERATOR 2 3 A new flatfile generator has been written to replace the old asn2ff code. 4 It is provided both as a stand-alone application, asn2gb, and as a pair of 5 C functions in the NCBI software toolkit. There are several command-line 6 arguments, with equivalent function parameters, that customize the behavior 7 of the new flatfile generator and optimize its performance. 8 9 NCBI maintains the GenBank nucleotide sequence database, and is part of the 10 International Nucleotide Sequence Database (INSD) collaboration. The list 11 of biological features and qualifiers approved by the collaborators for 12 official release and exchange of GenBank, EMBL, and DDBJ records can be 13 found at http://www.ncbi.nlm.nih.gov/projects/collab/FT/index.html 14 15 NCBI converts all direct sequence submissions, as well as records supplied 16 by our collaborators and other data sources, into a data model specified in 17 Abstract Syntax Notation 1 (ASN.1) format, regardless of the original form 18 of the data. From this common data representation we can generate GenBank 19 or FASTA files, populate BLAST sequence databases, or build indices for the 20 Entrez retrieval system. 21 22 Since the ASN.1 data is structured for ease of computation, asn2gb and the 23 underlying toolkit functions are able to provide useful derived information 24 at the request of the user. For example, the sequence of bases encoding an 25 mRNA feature can be presented in a /transcription qualifier. Because this 26 is not an INSD-approved qualifier, it will not be present in the official 27 flatfile release of a record. It can only be provided as an extension of 28 the collaboration-approved format, such as in "Sequin mode" below. 29 30 31 ASN2GB STANDALONE APPLICATION 32 33 An asn2gb executable is now available on all platforms, and is distributed 34 in the asn1-converters area of the NCBI public ftp site. The most commonly 35 used arguments are explained below. A more detailed discussion of various 36 parameter values is in the section on calling the SeqEntryToGnbk function 37 from within your own program code. 38 39 An input file and output file are required, but default to stdin and 40 stdout, respectively, if not specified in the command-line. 41 42 -i Input File Name 43 -o Output File Name 44 45 GenBank format produces the conventional GenBank flatfile on nucleotide 46 sequences. GenPept is the equivalent on protein sequences. INSDSet is a set 47 of one or more INSDSeq elements, which is an XML structured view of the 48 information in the flatfile. INSDSeq contains additional fields, derived 49 from the underlying data, provided as a convenience for computing on the 50 sequences and their feature annotations. 51 52 -f Format (b GenBank, e EMBL, p GenPept, t Feature Table, x INSDSet) 53 54 Sequin mode produces a relaxed flatfile that allows unapproved qualifiers 55 and database cross-references present in the record to be shown. It is 56 typically used while constructing a sequence record for submission. The 57 stricter modes are used for official GenBank releases and display of 58 flatfiles on the Entrez web site. 59 60 -m Mode (r Release, e Entrez, s Sequin, d Dump) 61 62 Normal style examines each record for "far" references (accessions not 63 packaged along with the sequence being formatted), and shows the CONTIG 64 block instead of separately fetching the underlying components. Master 65 style forces the components to be fetched and displays the actual sequence 66 letters. 67 68 -s Style (n Normal, s Segment, m Master, c Contig) 69 70 Bit flags and custom flags modify the appearance of the flatfile, such as 71 adding /transcription and /peptide extended qualifiers, eliminating the 72 need for writing programs to extract these subsequences. Locks are used to 73 preload "far" sequence components or lookup their accession numbers, and 74 can greatly speed up processing of records with far references. The values 75 are decimal numbers generated by combination of the appropriate binary 76 bits, which are described in the section on calling the SeqEntryToGnbk 77 function. 78 79 -g Bit Flags (1 HTML, 2 XML, 4 ContigFeats, 8 ContigSrcs, 16 FarTransl) 80 -h Lock/Lookup Flags (8 LockProd, 16 LookupComp, 64 LookupProd) 81 -u Custom Flags (2 HideMostImpFeats, 4 HideSnpFeats) 82 83 Batch processing of Bioseq-set ASN.1 release files is also supported. The 84 gb*.aso.gz compressed binary files from the NCBI public ftp site can also 85 be uncompressed on-the-fly on UNIX platforms. 86 87 -a ASN.1 Type 88 Single Record: a Any, e Seq-entry, b Bioseq, s Bioseq-set, m Seq-submit 89 Release File: t Batch Bioseq-set, u Batch Seq-submit 90 -b Bioseq-set is Binary [T/F] 91 -c Bioseq-set is Compressed [T/F] 92 -p Propagate Top Descriptors [T/F] 93 94 -l Log file 95 96 (Release files package many independent Seq-entry objects in a Bioseq-set. 97 Using -a t causes asn2gb to read one component at a time, processing it and 98 freeing it from memory before reading the next one. Otherwise it would try 99 to process the entire file at once, almost certainly running out of memory.) 100 101 Remote fetching allows accession lookups and fetching of far components 102 from the NCBI network server. An indicated accession can also be fetched 103 for formatting. 104 105 -r Remote Fetching [T/F] 106 -A Accession to Fetch 107 108 109 SEQENTRYTOGNBK FUNCTION 110 111 The NCBI software toolkit provides flatfile generation functions for 112 programmers to incorporate into their own computer applications. 113 114 SeqEntryToGnbk takes a SeqEntryPtr or SeqLocPtr and calls asn2gnbk_setup, 115 asn2gnbk_format, and asn2gnbk_cleanup, which are available from a private 116 header. It returns FALSE if there was a problem generating the flatfile. 117 BioseqToGnbk is simply a convenience function that takes a BioseqPtr, looks 118 up the parent SeqEntryPtr, and then calls SeqEntryToGnbk. To use these 119 functions, #include <asn2gnbk.h> in your program code. 120 121 NLM_EXTERN Boolean SeqEntryToGnbk ( 122 SeqEntryPtr sep, 123 SeqLocPtr slp, 124 FmtType format, 125 ModType mode, 126 StlType style, 127 FlgType flags, 128 LckType locks, 129 CstType custom, 130 XtraPtr extra, 131 FILE *fp 132 ); 133 134 NLM_EXTERN Boolean BioseqToGnbk ( 135 BioseqPtr bsp, 136 SeqLocPtr slp, 137 FmtType format, 138 ModType mode, 139 StlType style, 140 FlgType flags, 141 LckType locks, 142 CstType custom, 143 XtraPtr extra, 144 FILE *fp 145 ); 146 147 In the asn2gb application, format, mode, style, flags, locks, and custom 148 parameters are specified by the -f, -m, -s, -g, -h and -u arguments, 149 respectively. 150 151 152 FORMATS include GENBANK_FMT, EMBL_FMT, GENPEPT_FMT, and FTABLE_FMT 153 (Sequin's 5-column parsable feature table). If the SeqEntryPtr argument 154 passed to SeqEntryToGnbk points to a Bioseq-set, the function processes all 155 Bioseqs of the appropriate molecule type (nucleotide or protein) for the 156 specified format. 157 158 159 MODES are RELEASE_MODE, ENTREZ_MODE (release mode strictness except that it 160 allows local IDs and does not require a valid CDS /protein_id accession), 161 SEQUIN_MODE, and DUMP_MODE. RefSeq records can have certain qualifiers 162 (e.g., /transcript_id) and db_xrefs show up in release mode beyond those 163 approved by INSD agreement. Entrez mode is used for web display, and can 164 show new elements that haven't yet finished their 4-month quarantine period. 165 166 167 STYLES are NORMAL_STYLE, SEGMENT_STYLE, MASTER_STYLE, and CONTIG_STYLE. 168 Segment style is the traditional representation of segmented sequences, 169 while contig style displays a CONTIG line with a join of accessions instead 170 of a sequence. Normal style automatically chooses between segment and 171 contig style, depending upon the kind of data. (Near segmented records will 172 be done in segment style. Far segmented sequences or delta sequences with 173 no literals will be done as if you chose contig style.) Master style shows 174 features mapped to the segmented Bioseq's coordinates. 175 176 177 FLAGS are bit flags controlling appearance or behavior, and are ORed 178 together. 179 180 One 2-bit flag tells asn2gnbk to create HTML with web links, flatfile in 181 XML form, or flatfile in ASN.1 form. These settings are mutually exclusive. 182 The setup for creating HTML links is within SeqEntryToGnbk itself. 183 184 #define CREATE_HTML_FLATFILE 1 185 #define CREATE_XML_GBSEQ_FILE 2 186 #define CREATE_ASN_GBSEQ_FILE 3 187 188 Others control feature display behavior in contig style, whether it was 189 explicitly chosen or was called when a far segmented or far delta record 190 was processed in normal style. 191 192 #define SHOW_CONTIG_FEATURES 4 193 #define SHOW_CONTIG_SOURCES 8 194 195 A 2-bit flag set controls translation of CDS features with far products. 196 197 #define SHOW_FAR_TRANSLATION 16 198 #define TRANSLATE_IF_NO_PRODUCT 32 199 #define ALWAYS_TRANSLATE_CDS 48 200 201 The same set of flags also apply to transcription of mRNA features with far 202 products if the SHOW_TRANCRIPTION flag is also set. 203 204 #define SHOW_FAR_TRANSCRIPTION 16 205 #define TRANSCRIBE_IF_NO_PRODUCT 32 206 #define ALWAYS_TRANSCRIBE_MRNA 48 207 208 Any record can be shown with RefSeq policies for exception, source, and 209 other qualifiers, values, and db_xrefs that are not necessarily part of the 210 INSD agreement. 211 212 #define REFSEQ_CONVENTIONS 64 213 214 Another 2-bit flag controls where to get features when using far segmented 215 parts or far component delta Bioseqs. 216 217 #define ONLY_NEAR_FEATURES 128 218 #define FAR_FEATURES_SUPPRESS 256 219 #define NEAR_FEATURES_SUPPRESS 384 220 221 Other flags allow customization of reports from genomic product sets. 222 223 #define COPY_GPS_CDS_UP 512 224 #define COPY_GPS_GENE_DOWN 1024 225 226 The CONTIG block can be shown along with the sequence block in master or 227 segment style, when appropriate. 228 229 #define SHOW_CONTIG_AND_SEQ 2048 230 231 mRNAs and peptide features can show /transcription or /peptide sequence 232 qualifiers. This is most useful when generating INSDSeq XML so users do not 233 have to compute on the data themselves. 234 235 #define SHOW_TRANCRIPTION 4096 236 #define SHOW_PEPTIDE 8192 237 238 GBSeq XML has been replaced by INSDSeq XML. The CREATE_XML_GBSEQ_FILE flag 239 will actually produce INSDSeq. The original GBSeq can be generated during 240 the transition period by adding the following flag. 241 242 #define PRODUCE_OLD_GBSEQ 16384 243 244 Still others are expected to be rarely used, or are for testing new features. 245 246 #define DDBJ_VARIANT_FORMAT 32768 247 #define SPECIAL_GAP_DISPLAY 65536 248 #define FORCE_PRIMARY_BLOCK 131072 249 250 251 LOCKS are bits for controlling program performance, and are also ORed 252 together. 253 254 One flag set is for locking far segmented or delta components, far feature 255 location Bioseqs, or far feature product Bioseqs in advance. This prevents 256 the object manager from uncaching components at an inopportune time, 257 causing unnecessary thrashing. Far component Bioseqs are needed for 258 displaying the sequence. 259 260 #define LOCK_FAR_COMPONENTS 2 261 #define LOCK_FAR_LOCATIONS 4 262 #define LOCK_FAR_PRODUCTS 8 263 264 Another set attempts to do bulk accession to gi lookups in advance, which 265 is possible if PubSeqFetchEnable was called by the application. Remote 266 fetching in asn2gb uses this new access mechanism. Far component IDs are 267 needed for the CONTIG line, far location IDs for feature location joins, 268 and far product IDs for the /protein_id and /transcript_id accessions. 269 270 #define LOOKUP_FAR_COMPONENTS 16 271 #define LOOKUP_FAR_LOCATIONS 32 272 #define LOOKUP_FAR_PRODUCTS 64 273 #define LOOKUP_FAR_HISTORY 128 274 #define LOOKUP_FAR_INFERENCE 256 275 #define LOOKUP_FAR_OTHERS 512 276 277 To use PubSeqFetchEnable, the application should #include <pmfapi.h>. 278 279 280 CUSTOM are bit flags suppressing specific features, and are also ORed 281 together. 282 283 One set enables display of statistics for features and references. 284 285 #define SHOW_FEATURE_STATS 1 286 #define SHOW_REFERENCE_STATS 2 287 288 Another set suppresses common feature types or all features. 289 290 #define HIDE_FEATURES 4 291 292 #define HIDE_IMP_FEATS 8 293 #define HIDE_VARS_AND_REPT_REGNS 16 294 #define HIDE_SITES_BONDS_REGIONS 32 295 #define HIDE_CDD_FEATS 64 296 #define HIDE_CDS_PROD_FEATS 128 297 298 A 3-bit flag controls selective display of GeneRIF references or review 299 articles in RefSeq records. 300 301 #define HIDE_GENE_RIFS 256 302 #define ONLY_GENE_RIFS 512 303 #define ONLY_REVIEW_PUBS 768 304 #define NEWEST_PUBS 1024 305 #define OLDEST_PUBS 1280 306 #define HIDE_ALL_PUBS 1792 307 308 Protein feature tables and References in feature tables can also be shown. 309 310 #define SHOW_PROT_FTABLE 2048 311 #define SHOW_FTABLE_REFS 4096 312 313 Source features, instantiated Gap features, and the sequence itself can 314 also be suppressed. 315 316 #define HIDE_SOURCE_FEATS 8192 317 #define HIDE_GAP_FEATS 16384 318 #define HIDE_SEQUENCE 32768 319 320 Gaps in far delta sequences in Web Entrez are normally converted to a 321 shorthand notation. These can be forced to expand to runs of Ns. 322 323 #define EXPANDED_GAP_DISPLAY 65536 324 325 Gene Ontology terms can be suppressed if desired. 326 327 #define HIDE_GO_TERMS 131072 328 329 The CDS /translation can also be suppressed, even with near products. 330 331 #define HIDE_TRANSLATION 262144 332 333 Evidence qualifiers, including experiment and inference, can be suppressed. 334 335 #define HIDE_EVIDENCE_QUALS 524288 336 337 338 EXTRA is an opaque pointer used for preparing internal NCBI indices. Most 339 programs will pass NULL for this parameter. 340 341 342 SAMPLE GENBANK FLATFILE 343 344 A sample genomic sequence encoding a spliced mRNA is shown below in GenBank 345 format. The exon features in the original record have been removed from 346 this example. 347 348 LOCUS AF012431 2141 bp DNA linear ROD 07-FEB-2000 349 DEFINITION Mus musculus D-dopachrome tautomerase (Ddt) gene, complete cds. 350 ACCESSION AF012431 351 VERSION AF012431.1 GI:2352907 352 KEYWORDS . 353 SOURCE Mus musculus (house mouse) 354 ORGANISM Mus musculus 355 Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 356 Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; 357 Sciurognathi; Muridae; Murinae; Mus. 358 REFERENCE 1 (bases 1 to 2141) 359 AUTHORS Esumi,N., Budarf,M., Ciccarelli,L., Sellinger,B., Kozak,C.A. and 360 Wistow,G. 361 TITLE Conserved gene structure and genomic linkage for D-dopachrome 362 tautomerase (DDT) and MIF 363 JOURNAL Mamm. Genome 9 (9), 753-757 (1998) 364 PUBMED 9716662 365 REFERENCE 2 (bases 1 to 2141) 366 AUTHORS Esumi,N. and Wistow,G. 367 TITLE Direct Submission 368 JOURNAL Submitted (03-JUL-1997) Molecular Structure and Function, NEI, 369 Building 6, Rm. 331, NIH, Bethesda, MD 20892, USA 370 FEATURES Location/Qualifiers 371 source 1..2141 372 /organism="Mus musculus" 373 /mol_type="genomic DNA" 374 /db_xref="taxon:10090" 375 /chromosome="10" 376 gene 1..2141 377 /gene="Ddt" 378 mRNA join(1..159,462..637,1868..2141) 379 /gene="Ddt" 380 /product="D-dopachrome tautomerase" 381 CDS join(52..159,462..637,1868..1940) 382 /gene="Ddt" 383 /note="related to macrophage migration inhibitory factor 384 (MIF); in vitro activity on D-dopachrome" 385 /codon_start=1 386 /product="D-dopachrome tautomerase" 387 /protein_id="AAC77467.1" 388 /db_xref="GI:2352908" 389 /translation="MPFVELETNLPASRIPAGLENRLCAATATILDKPEDRVSVTIRP 390 GMTLLMNKSTEPCAHLLVSSIGVVGTAEQNRTHSASFFKFLTEELSLDQDRIVIRFFP 391 LEAWQIGKKGTVMTFL" 392 BASE COUNT 473 a 567 c 570 g 531 t 393 ORIGIN 394 1 agctcacccg gtgcagttac cgtttggcga tcccactctt ctcccgctaa catgccattc 395 61 gttgagttgg aaacaaactt gccggctagc cgcatacccg cggggctgga gaaccggctg 396 121 tgtgcggcca cagccaccat cctggacaaa cccgaagacg tgagtgaggg tcggcgagaa 397 181 cttgtgggct agggtcggac ctcccaatga cccgttccca tccccaggga ccccactccc 398 241 ctggtaacct ctgaccttcc gtgtcctatc ctcccttcct agatcccttc ctggttgtct 399 301 ttcccaggcg tgaccctgac gtgactgact cccaaggatc ctgggcagtg tcccagaccc 400 361 ggagccctcg gacccccacg ttaagaattt ggcgcgcccc cttctgaaca ccagccatgc 401 421 cctgcccaag cttcaggatt taacttttgt gttccttgca gcgcgtgagc gttacgatac 402 481 gacctggcat gaccctgttg atgaacaaat ccacagagcc ttgtgctcac cttctggtct 403 541 cttccatcgg ggttgtgggc accgcggagc agaaccgcac tcacagcgcc agcttcttca 404 601 agttcctcac cgaggagctg tccctggacc aggaccggta tgcagggcca gtgagggaac 405 661 gtatttgtgc gtctggagtc aggactcagt ctctctgtat gaggttgggg ggggggaggg 406 721 gtcactattt gctggttcca gaaagcactc agtgtccttg tccacgaagg tggactcctc 407 781 aggcactgga atggtgagtc tgtgatcaga atgatagcaa gatttcaatt ccttcgactc 408 841 tctacagccc cgagaaagga tggtttggga agccccagtg ttgtcttgtg tgtactgaga 409 901 atctacttag gcaccctctt aaccactgtg atagtggcct cctcaccgtc actgaaccag 410 961 ggggtctggt tttttaaggg agaacttttc caggctggtc cgagggaatc tggttgtgtc 411 1021 ctgaggcaga taacctttga actagataag gctccgggag agttgctgga tgataaaaag 412 1081 acctccccca caaggtgacc ctaccctccc ccctccccat ccttacattc tgaggcagag 413 1141 ttagagtctc atattcctga ggctggagcg ggcctgtgaa gaactacgga gataagtttg 414 1201 aaagagcctt ccaaaatgga gtcctagtgg gctcaggaaa gttggtattg gctgcttttg 415 1261 ttggatgctc aaatgctgtc ctttagttga ggggacaata cttcttaacg gtaatgctcg 416 1321 tgcacacagc acagggcaga tttggtagct tcctgacata gataactgta ttgggccagt 417 1381 tttacagatg gaaacctgag ggtgtcagcc ctgtgcacaa ccaccctggt gccagacgat 418 1441 cgccagggac ttcctctgag tcctgtgatt gagcaattgc tgattcccac agatttgaat 419 1501 cagatttgaa cctgcgcctc acttagagct gggctttggt tcaaaactaa gtgcctggta 420 1561 ccctgggcac gcctttagga gcatgcagtt agttagaagc agggggactg tttgttagcc 421 1621 cgtaagcagc ctaacatgct cacctgagca cagagcacag gtattgaagc cattgcgtta 422 1681 agtctgcact gggaccggta tagccatcac ctttcttctg acttgtcttt ggtgcaagga 423 1741 tcattagctg gggtgggcag attggcaaaa tatcctgcag gctgatatgg gctggcctgt 424 1801 ctggcaggga ccttaacaaa tgaggggtgt atgcaggagt tgacatctct ccttcttcct 425 1861 cctaaaggat cgttatccgc ttcttcccct tggaggcttg gcagatcgga aagaaaggaa 426 1921 ctgtcatgac atttctgtga cggaaacaaa gaacccaggg tgtttgctcg aaccgggcca 427 1981 gagcccttcc agagaggccc tcccggcaga atcgtggcct ggtagatagg atggtaaatc 428 2041 cctcttttgc ctaaacgtct gcgacttcag tggtccattt ttctcttccc cagcctcgtg 429 2101 aataattgaa agagagcaaa taaatgaaga gaatatcatt c 430 // 431 432 433 SAMPLE INSDSET XML 434 435 The same record is shown in INSDSet XML format. INSDSeq XML is a data 436 distribution format meant to be read by a computer, not a display format 437 intended for human reading, so sequence letters are single strings of 438 characters with no spaces or newlines. (The sequences and other long lines 439 are word-wrapped here only for printing.) 440 441 The INSDFeature_location is the string displayed exactly as it was in the 442 GenBank flatfile. 443 444 join(1..159,462..637,1868..2141) 445 446 For the convenience of users who wish to compute on features without having 447 to parse these string, the individual feature intervals are also presented 448 individually. 449 450 <INSDFeature_intervals> 451 <INSDInterval> 452 <INSDInterval_from>1</INSDInterval_from> 453 <INSDInterval_to>159</INSDInterval_to> 454 <INSDInterval_accession>AF012431.1</INSDInterval_accession> 455 </INSDInterval> 456 <INSDInterval> 457 <INSDInterval_from>462</INSDInterval_from> 458 <INSDInterval_to>637</INSDInterval_to> 459 <INSDInterval_accession>AF012431.1</INSDInterval_accession> 460 </INSDInterval> 461 <INSDInterval> 462 ... 463 464 The record here was generated with the SHOW_TRANCRIPTION flag set to 465 extract the bases under the mRNA feature interval and display them in a 466 transcription qualifier. This eliminates the need to process feature 467 intervals for the common task of obtaining the mRNA bases. SHOW_PEPTIDE 468 does the same for extracting peptide sequences from under sig_peptide or 469 mat_peptide features. The transcription and peptide qualifiers are 470 extensions of those approved by the INSD for official releases. 471 472 <?xml version="1.0"?> 473 <!DOCTYPE INSDSet PUBLIC "-//NCBI//INSD INSDSeq/EN" 474 "http://www.ncbi.nlm.nih.gov/INSD_INSDSeq.dtd"> 475 <INSDSet> 476 <INSDSeq> 477 <INSDSeq_locus>AF012431</INSDSeq_locus> 478 <INSDSeq_length>2141</INSDSeq_length> 479 <INSDSeq_moltype>DNA</INSDSeq_moltype> 480 <INSDSeq_topology>linear</INSDSeq_topology> 481 <INSDSeq_division>ROD</INSDSeq_division> 482 <INSDSeq_update-date>07-FEB-2000</INSDSeq_update-date> 483 <INSDSeq_create-date>03-SEP-1997</INSDSeq_create-date> 484 <INSDSeq_definition>Mus musculus D-dopachrome tautomerase (Ddt) gene, 485 complete cds</INSDSeq_definition> 486 <INSDSeq_primary-accession>AF012431</INSDSeq_primary-accession> 487 <INSDSeq_accession-version>AF012431.1</INSDSeq_accession-version> 488 <INSDSeq_other-seqids> 489 <INSDSeqid>gb|AF012431.1|AF012431</INSDSeqid> 490 <INSDSeqid>gi|2352907</INSDSeqid> 491 </INSDSeq_other-seqids> 492 <INSDSeq_source>Mus musculus (house mouse)</INSDSeq_source> 493 <INSDSeq_organism>Mus musculus</INSDSeq_organism> 494 <INSDSeq_taxonomy>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; 495 Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; 496 Sciurognathi; Muridae; Murinae; Mus</INSDSeq_taxonomy> 497 <INSDSeq_references> 498 <INSDReference> 499 <INSDReference_reference>1 (bases 1 to 2141)</INSDReference_reference> 500 <INSDReference_authors> 501 <INSDAuthor>Esumi,N.</INSDAuthor> 502 <INSDAuthor>Budarf,M.</INSDAuthor> 503 <INSDAuthor>Ciccarelli,L.</INSDAuthor> 504 <INSDAuthor>Sellinger,B.</INSDAuthor> 505 <INSDAuthor>Kozak,C.A.</INSDAuthor> 506 <INSDAuthor>Wistow,G.</INSDAuthor> 507 </INSDReference_authors> 508 <INSDReference_title>Conserved gene structure and genomic linkage for 509 D-dopachrome tautomerase (DDT) and MIF</INSDReference_title> 510 <INSDReference_journal>Mamm. Genome 9 (9), 753-757 (1998) 511 </INSDReference_journal> 512 <INSDReference_pubmed>9716662</INSDReference_pubmed> 513 </INSDReference> 514 <INSDReference> 515 <INSDReference_reference>2 (bases 1 to 2141)</INSDReference_reference> 516 <INSDReference_authors> 517 <INSDAuthor>Esumi,N.</INSDAuthor> 518 <INSDAuthor>Wistow,G.</INSDAuthor> 519 </INSDReference_authors> 520 <INSDReference_title>Direct Submission</INSDReference_title> 521 <INSDReference_journal>Submitted (03-JUL-1997) Molecular Structure and 522 Function, NEI, Building 6, Rm. 331, NIH, Bethesda, MD 20892, USA 523 </INSDReference_journal> 524 </INSDReference> 525 </INSDSeq_references> 526 <INSDSeq_feature-table> 527 <INSDFeature> 528 <INSDFeature_key>source</INSDFeature_key> 529 <INSDFeature_location>1..2141</INSDFeature_location> 530 <INSDFeature_quals> 531 <INSDQualifier> 532 <INSDQualifier_name>organism</INSDQualifier_name> 533 <INSDQualifier_value>Mus musculus</INSDQualifier_value> 534 </INSDQualifier> 535 <INSDQualifier> 536 <INSDQualifier_name>mol_type</INSDQualifier_name> 537 <INSDQualifier_value>genomic DNA</INSDQualifier_value> 538 </INSDQualifier> 539 <INSDQualifier> 540 <INSDQualifier_name>db_xref</INSDQualifier_name> 541 <INSDQualifier_value>taxon:10090</INSDQualifier_value> 542 </INSDQualifier> 543 <INSDQualifier> 544 <INSDQualifier_name>chromosome</INSDQualifier_name> 545 <INSDQualifier_value>10</INSDQualifier_value> 546 </INSDQualifier> 547 </INSDFeature_quals> 548 </INSDFeature> 549 <INSDFeature> 550 <INSDFeature_key>gene</INSDFeature_key> 551 <INSDFeature_location>1..2141</INSDFeature_location> 552 <INSDFeature_intervals> 553 <INSDInterval> 554 <INSDInterval_from>1</INSDInterval_from> 555 <INSDInterval_to>2141</INSDInterval_to> 556 <INSDInterval_accession>AF012431.1</INSDInterval_accession> 557 </INSDInterval> 558 </INSDFeature_intervals> 559 <INSDFeature_quals> 560 <INSDQualifier> 561 <INSDQualifier_name>gene</INSDQualifier_name> 562 <INSDQualifier_value>Ddt</INSDQualifier_value> 563 </INSDQualifier> 564 </INSDFeature_quals> 565 </INSDFeature> 566 <INSDFeature> 567 <INSDFeature_key>mRNA</INSDFeature_key> 568 <INSDFeature_location>join(1..159,462..637,1868..2141) 569 </INSDFeature_location> 570 <INSDFeature_intervals> 571 <INSDInterval> 572 <INSDInterval_from>1</INSDInterval_from> 573 <INSDInterval_to>159</INSDInterval_to> 574 <INSDInterval_accession>AF012431.1</INSDInterval_accession> 575 </INSDInterval> 576 <INSDInterval> 577 <INSDInterval_from>462</INSDInterval_from> 578 <INSDInterval_to>637</INSDInterval_to> 579 <INSDInterval_accession>AF012431.1</INSDInterval_accession> 580 </INSDInterval> 581 <INSDInterval> 582 <INSDInterval_from>1868</INSDInterval_from> 583 <INSDInterval_to>2141</INSDInterval_to> 584 <INSDInterval_accession>AF012431.1</INSDInterval_accession> 585 </INSDInterval> 586 </INSDFeature_intervals> 587 <INSDFeature_quals> 588 <INSDQualifier> 589 <INSDQualifier_name>gene</INSDQualifier_name> 590 <INSDQualifier_value>Ddt</INSDQualifier_value> 591 </INSDQualifier> 592 <INSDQualifier> 593 <INSDQualifier_name>product</INSDQualifier_name> 594 <INSDQualifier_value>D-dopachrome tautomerase</INSDQualifier_value> 595 </INSDQualifier> 596 <INSDQualifier> 597 <INSDQualifier_name>transcription</INSDQualifier_name> 598 <INSDQualifier_value>AGCTCACCCGGTGCAGTTACCGTTTGGCGATCCCACTCTTCT 599 CCCGCTAACATGCCATTCGTTGAGTTGGAAACAAACTTGCCGGCTAGCCGCATACCCGCGGGGCTGGAGAACCGG 600 CTGTGTGCGGCCACAGCCACCATCCTGGACAAACCCGAAGACCGCGTGAGCGTTACGATACGACCTGGCATGACC 601 CTGTTGATGAACAAATCCACAGAGCCTTGTGCTCACCTTCTGGTCTCTTCCATCGGGGTTGTGGGCACCGCGGAG 602 CAGAACCGCACTCACAGCGCCAGCTTCTTCAAGTTCCTCACCGAGGAGCTGTCCCTGGACCAGGACCGGATCGTT 603 ATCCGCTTCTTCCCCTTGGAGGCTTGGCAGATCGGAAAGAAAGGAACTGTCATGACATTTCTGTGACGGAAACAA 604 AGAACCCAGGGTGTTTGCTCGAACCGGGCCAGAGCCCTTCCAGAGAGGCCCTCCCGGCAGAATCGTGGCCTGGTA 605 GATAGGATGGTAAATCCCTCTTTTGCCTAAACGTCTGCGACTTCAGTGGTCCATTTTTCTCTTCCCCAGCCTCGT 606 GAATAATTGAAAGAGAGCAAATAAATGAAGAGAATATCATTC</INSDQualifier_value> 607 </INSDQualifier> 608 </INSDFeature_quals> 609 </INSDFeature> 610 <INSDFeature> 611 <INSDFeature_key>CDS</INSDFeature_key> 612 <INSDFeature_location>join(52..159,462..637,1868..1940) 613 </INSDFeature_location> 614 <INSDFeature_intervals> 615 <INSDInterval> 616 <INSDInterval_from>52</INSDInterval_from> 617 <INSDInterval_to>159</INSDInterval_to> 618 <INSDInterval_accession>AF012431.1</INSDInterval_accession> 619 </INSDInterval> 620 <INSDInterval> 621 <INSDInterval_from>462</INSDInterval_from> 622 <INSDInterval_to>637</INSDInterval_to> 623 <INSDInterval_accession>AF012431.1</INSDInterval_accession> 624 </INSDInterval> 625 <INSDInterval> 626 <INSDInterval_from>1868</INSDInterval_from> 627 <INSDInterval_to>1940</INSDInterval_to> 628 <INSDInterval_accession>AF012431.1</INSDInterval_accession> 629 </INSDInterval> 630 </INSDFeature_intervals> 631 <INSDFeature_quals> 632 <INSDQualifier> 633 <INSDQualifier_name>gene</INSDQualifier_name> 634 <INSDQualifier_value>Ddt</INSDQualifier_value> 635 </INSDQualifier> 636 <INSDQualifier> 637 <INSDQualifier_name>note</INSDQualifier_name> 638 <INSDQualifier_value>related to macrophage migration inhibitory 639 factor (MIF); in vitro activity on D-dopachrome</INSDQualifier_value> 640 </INSDQualifier> 641 <INSDQualifier> 642 <INSDQualifier_name>codon_start</INSDQualifier_name> 643 <INSDQualifier_value>1</INSDQualifier_value> 644 </INSDQualifier> 645 <INSDQualifier> 646 <INSDQualifier_name>transl_table</INSDQualifier_name> 647 <INSDQualifier_value>1</INSDQualifier_value> 648 </INSDQualifier> 649 <INSDQualifier> 650 <INSDQualifier_name>product</INSDQualifier_name> 651 <INSDQualifier_value>D-dopachrome tautomerase</INSDQualifier_value> 652 </INSDQualifier> 653 <INSDQualifier> 654 <INSDQualifier_name>protein_id</INSDQualifier_name> 655 <INSDQualifier_value>AAC77467.1</INSDQualifier_value> 656 </INSDQualifier> 657 <INSDQualifier> 658 <INSDQualifier_name>db_xref</INSDQualifier_name> 659 <INSDQualifier_value>GI:2352908</INSDQualifier_value> 660 </INSDQualifier> 661 <INSDQualifier> 662 <INSDQualifier_name>translation</INSDQualifier_name> 663 <INSDQualifier_value>MPFVELETNLPASRIPAGLENRLCAATATILDKPEDRVSVTI 664 RPGMTLLMNKSTEPCAHLLVSSIGVVGTAEQNRTHSASFFKFLTEELSLDQDRIVIRFFPLEAWQIGKKGTVMTF 665 L</INSDQualifier_value> 666 </INSDQualifier> 667 </INSDFeature_quals> 668 </INSDFeature> 669 </INSDSeq_feature-table> 670 <INSDSeq_sequence>AGCTCACCCGGTGCAGTTACCGTTTGGCGATCCCACTCTTCTCCCGCTAACAT 671 GCCATTCGTTGAGTTGGAAACAAACTTGCCGGCTAGCCGCATACCCGCGGGGCTGGAGAACCGGCTGTGTGCGGC 672 CACAGCCACCATCCTGGACAAACCCGAAGACGTGAGTGAGGGTCGGCGAGAACTTGTGGGCTAGGGTCGGACCTC 673 CCAATGACCCGTTCCCATCCCCAGGGACCCCACTCCCCTGGTAACCTCTGACCTTCCGTGTCCTATCCTCCCTTC 674 CTAGATCCCTTCCTGGTTGTCTTTCCCAGGCGTGACCCTGACGTGACTGACTCCCAAGGATCCTGGGCAGTGTCC 675 CAGACCCGGAGCCCTCGGACCCCCACGTTAAGAATTTGGCGCGCCCCCTTCTGAACACCAGCCATGCCCTGCCCA 676 AGCTTCAGGATTTAACTTTTGTGTTCCTTGCAGCGCGTGAGCGTTACGATACGACCTGGCATGACCCTGTTGATG 677 AACAAATCCACAGAGCCTTGTGCTCACCTTCTGGTCTCTTCCATCGGGGTTGTGGGCACCGCGGAGCAGAACCGC 678 ACTCACAGCGCCAGCTTCTTCAAGTTCCTCACCGAGGAGCTGTCCCTGGACCAGGACCGGTATGCAGGGCCAGTG 679 AGGGAACGTATTTGTGCGTCTGGAGTCAGGACTCAGTCTCTCTGTATGAGGTTGGGGGGGGGGAGGGGTCACTAT 680 TTGCTGGTTCCAGAAAGCACTCAGTGTCCTTGTCCACGAAGGTGGACTCCTCAGGCACTGGAATGGTGAGTCTGT 681 GATCAGAATGATAGCAAGATTTCAATTCCTTCGACTCTCTACAGCCCCGAGAAAGGATGGTTTGGGAAGCCCCAG 682 TGTTGTCTTGTGTGTACTGAGAATCTACTTAGGCACCCTCTTAACCACTGTGATAGTGGCCTCCTCACCGTCACT 683 GAACCAGGGGGTCTGGTTTTTTAAGGGAGAACTTTTCCAGGCTGGTCCGAGGGAATCTGGTTGTGTCCTGAGGCA 684 GATAACCTTTGAACTAGATAAGGCTCCGGGAGAGTTGCTGGATGATAAAAAGACCTCCCCCACAAGGTGACCCTA 685 CCCTCCCCCCTCCCCATCCTTACATTCTGAGGCAGAGTTAGAGTCTCATATTCCTGAGGCTGGAGCGGGCCTGTG 686 AAGAACTACGGAGATAAGTTTGAAAGAGCCTTCCAAAATGGAGTCCTAGTGGGCTCAGGAAAGTTGGTATTGGCT 687 GCTTTTGTTGGATGCTCAAATGCTGTCCTTTAGTTGAGGGGACAATACTTCTTAACGGTAATGCTCGTGCACACA 688 GCACAGGGCAGATTTGGTAGCTTCCTGACATAGATAACTGTATTGGGCCAGTTTTACAGATGGAAACCTGAGGGT 689 GTCAGCCCTGTGCACAACCACCCTGGTGCCAGACGATCGCCAGGGACTTCCTCTGAGTCCTGTGATTGAGCAATT 690 GCTGATTCCCACAGATTTGAATCAGATTTGAACCTGCGCCTCACTTAGAGCTGGGCTTTGGTTCAAAACTAAGTG 691 CCTGGTACCCTGGGCACGCCTTTAGGAGCATGCAGTTAGTTAGAAGCAGGGGGACTGTTTGTTAGCCCGTAAGCA 692 GCCTAACATGCTCACCTGAGCACAGAGCACAGGTATTGAAGCCATTGCGTTAAGTCTGCACTGGGACCGGTATAG 693 CCATCACCTTTCTTCTGACTTGTCTTTGGTGCAAGGATCATTAGCTGGGGTGGGCAGATTGGCAAAATATCCTGC 694 AGGCTGATATGGGCTGGCCTGTCTGGCAGGGACCTTAACAAATGAGGGGTGTATGCAGGAGTTGACATCTCTCCT 695 TCTTCCTCCTAAAGGATCGTTATCCGCTTCTTCCCCTTGGAGGCTTGGCAGATCGGAAAGAAAGGAACTGTCATG 696 ACATTTCTGTGACGGAAACAAAGAACCCAGGGTGTTTGCTCGAACCGGGCCAGAGCCCTTCCAGAGAGGCCCTCC 697 CGGCAGAATCGTGGCCTGGTAGATAGGATGGTAAATCCCTCTTTTGCCTAAACGTCTGCGACTTCAGTGGTCCAT 698 TTTTCTCTTCCCCAGCCTCGTGAATAATTGAAAGAGAGCAAATAAATGAAGAGAATATCATTC 699 </INSDSeq_sequence> 700 </INSDSeq> 701 </INSDSet> 702
|
This page was automatically generated by the
LXR engine.
Visit the LXR main site for more information. |