NCBI C Toolkit Cross Reference

C/doc/asn2gb.txt


  1 GENBANK FLATFILE GENERATOR
  2 
  3 A new flatfile generator has been written to replace the old asn2ff code.
  4 It is provided both as a stand-alone application, asn2gb, and as a pair of
  5 C functions in the NCBI software toolkit. There are several command-line
  6 arguments, with equivalent function parameters, that customize the behavior
  7 of the new flatfile generator and optimize its performance.
  8 
  9 NCBI maintains the GenBank nucleotide sequence database, and is part of the
 10 International Nucleotide Sequence Database (INSD) collaboration. The list
 11 of biological features and qualifiers approved by the collaborators for
 12 official release and exchange of GenBank, EMBL, and DDBJ records can be
 13 found at http://www.ncbi.nlm.nih.gov/projects/collab/FT/index.html
 14 
 15 NCBI converts all direct sequence submissions, as well as records supplied
 16 by our collaborators and other data sources, into a data model specified in
 17 Abstract Syntax Notation 1 (ASN.1) format, regardless of the original form
 18 of the data. From this common data representation we can generate GenBank
 19 or FASTA files, populate BLAST sequence databases, or build indices for the
 20 Entrez retrieval system.
 21 
 22 Since the ASN.1 data is structured for ease of computation, asn2gb and the
 23 underlying toolkit functions are able to provide useful derived information
 24 at the request of the user. For example, the sequence of bases encoding an
 25 mRNA feature can be presented in a /transcription qualifier. Because this
 26 is not an INSD-approved qualifier, it will not be present in the official
 27 flatfile release of a record. It can only be provided as an extension of
 28 the collaboration-approved format, such as in "Sequin mode" below.
 29 
 30 
 31 ASN2GB STANDALONE APPLICATION
 32 
 33 An asn2gb executable is now available on all platforms, and is distributed
 34 in the asn1-converters area of the NCBI public ftp site. The most commonly
 35 used arguments are explained below. A more detailed discussion of various
 36 parameter values is in the section on calling the SeqEntryToGnbk function
 37 from within your own program code.
 38 
 39 An input file and output file are required, but default to stdin and
 40 stdout, respectively, if not specified in the command-line.
 41 
 42   -i  Input File Name
 43   -o  Output File Name
 44 
 45 GenBank format produces the conventional GenBank flatfile on nucleotide
 46 sequences. GenPept is the equivalent on protein sequences. INSDSet is a set
 47 of one or more INSDSeq elements, which is an XML structured view of the
 48 information in the flatfile. INSDSeq contains additional fields, derived
 49 from the underlying data, provided as a convenience for computing on the
 50 sequences and their feature annotations.
 51 
 52   -f  Format (b GenBank, e EMBL, p GenPept, t Feature Table, x INSDSet)
 53 
 54 Sequin mode produces a relaxed flatfile that allows unapproved qualifiers
 55 and database cross-references present in the record to be shown. It is
 56 typically used while constructing a sequence record for submission. The
 57 stricter modes are used for official GenBank releases and display of
 58 flatfiles on the Entrez web site.
 59 
 60   -m  Mode (r Release, e Entrez, s Sequin, d Dump)
 61 
 62 Normal style examines each record for "far" references (accessions not
 63 packaged along with the sequence being formatted), and shows the CONTIG
 64 block instead of separately fetching the underlying components. Master
 65 style forces the components to be fetched and displays the actual sequence
 66 letters.
 67 
 68   -s  Style (n Normal, s Segment, m Master, c Contig)
 69 
 70 Bit flags and custom flags modify the appearance of the flatfile, such as
 71 adding /transcription and /peptide extended qualifiers, eliminating the
 72 need for writing programs to extract these subsequences. Locks are used to
 73 preload "far" sequence components or lookup their accession numbers, and
 74 can greatly speed up processing of records with far references. The values
 75 are decimal numbers generated by combination of the appropriate binary
 76 bits, which are described in the section on calling the SeqEntryToGnbk
 77 function.
 78 
 79   -g  Bit Flags (1 HTML, 2 XML, 4 ContigFeats, 8 ContigSrcs, 16 FarTransl)
 80   -h  Lock/Lookup Flags (8 LockProd, 16 LookupComp, 64 LookupProd)
 81   -u Custom Flags (2 HideMostImpFeats, 4 HideSnpFeats)
 82 
 83 Batch processing of Bioseq-set ASN.1 release files is also supported. The
 84 gb*.aso.gz compressed binary files from the NCBI public ftp site can also
 85 be uncompressed on-the-fly on UNIX platforms.
 86 
 87   -a  ASN.1 Type
 88       Single Record: a Any, e Seq-entry, b Bioseq, s Bioseq-set, m Seq-submit
 89       Release File: t Batch Bioseq-set, u Batch Seq-submit
 90   -b  Bioseq-set is Binary [T/F]
 91   -c  Bioseq-set is Compressed [T/F]
 92   -p  Propagate Top Descriptors [T/F]
 93 
 94   -l  Log file
 95 
 96 (Release files package many independent Seq-entry objects in a Bioseq-set.
 97 Using -a t causes asn2gb to read one component at a time, processing it and
 98 freeing it from memory before reading the next one.  Otherwise it would try
 99 to process the entire file at once, almost certainly running out of memory.)
100 
101 Remote fetching allows accession lookups and fetching of far components
102 from the NCBI network server. An indicated accession can also be fetched
103 for formatting.
104 
105   -r  Remote Fetching [T/F]
106   -A  Accession to Fetch
107 
108 
109 SEQENTRYTOGNBK FUNCTION
110 
111 The NCBI software toolkit provides flatfile generation functions for
112 programmers to incorporate into their own computer applications.
113 
114 SeqEntryToGnbk takes a SeqEntryPtr or SeqLocPtr and calls asn2gnbk_setup,
115 asn2gnbk_format, and asn2gnbk_cleanup, which are available from a private
116 header. It returns FALSE if there was a problem generating the flatfile.
117 BioseqToGnbk is simply a convenience function that takes a BioseqPtr, looks
118 up the parent SeqEntryPtr, and then calls SeqEntryToGnbk. To use these
119 functions, #include <asn2gnbk.h> in your program code.
120 
121 NLM_EXTERN Boolean SeqEntryToGnbk (
122   SeqEntryPtr sep,
123   SeqLocPtr slp,
124   FmtType format,
125   ModType mode,
126   StlType style,
127   FlgType flags,
128   LckType locks,
129   CstType custom,
130   XtraPtr extra,
131   FILE *fp
132 );
133 
134 NLM_EXTERN Boolean BioseqToGnbk (
135   BioseqPtr bsp,
136   SeqLocPtr slp,
137   FmtType format,
138   ModType mode,
139   StlType style,
140   FlgType flags,
141   LckType locks,
142   CstType custom,
143   XtraPtr extra,
144   FILE *fp
145 );
146 
147 In the asn2gb application, format, mode, style, flags, locks, and custom
148 parameters are specified by the -f, -m, -s, -g, -h and -u arguments,
149 respectively.
150 
151 
152 FORMATS include GENBANK_FMT, EMBL_FMT, GENPEPT_FMT, and FTABLE_FMT
153 (Sequin's 5-column parsable feature table). If the SeqEntryPtr argument
154 passed to SeqEntryToGnbk points to a Bioseq-set, the function processes all
155 Bioseqs of the appropriate molecule type (nucleotide or protein) for the
156 specified format.
157 
158 
159 MODES are RELEASE_MODE, ENTREZ_MODE (release mode strictness except that it
160 allows local IDs and does not require a valid CDS /protein_id accession),
161 SEQUIN_MODE, and DUMP_MODE. RefSeq records can have certain qualifiers
162 (e.g., /transcript_id) and db_xrefs show up in release mode beyond those
163 approved by INSD agreement. Entrez mode is used for web display, and can
164 show new elements that haven't yet finished their 4-month quarantine period.
165 
166 
167 STYLES are NORMAL_STYLE, SEGMENT_STYLE, MASTER_STYLE, and CONTIG_STYLE.
168 Segment style is the traditional representation of segmented sequences,
169 while contig style displays a CONTIG line with a join of accessions instead
170 of a sequence. Normal style automatically chooses between segment and
171 contig style, depending upon the kind of data. (Near segmented records will
172 be done in segment style. Far segmented sequences or delta sequences with
173 no literals will be done as if you chose contig style.) Master style shows
174 features mapped to the segmented Bioseq's coordinates.
175 
176 
177 FLAGS are bit flags controlling appearance or behavior, and are ORed
178 together.
179 
180 One 2-bit flag tells asn2gnbk to create HTML with web links, flatfile in
181 XML form, or flatfile in ASN.1 form. These settings are mutually exclusive.
182 The setup for creating HTML links is within SeqEntryToGnbk itself.
183 
184 #define CREATE_HTML_FLATFILE       1
185 #define CREATE_XML_GBSEQ_FILE      2
186 #define CREATE_ASN_GBSEQ_FILE      3
187 
188 Others control feature display behavior in contig style, whether it was
189 explicitly chosen or was called when a far segmented or far delta record
190 was processed in normal style.
191 
192 #define SHOW_CONTIG_FEATURES       4
193 #define SHOW_CONTIG_SOURCES        8
194 
195 A 2-bit flag set controls translation of CDS features with far products.
196 
197 #define SHOW_FAR_TRANSLATION      16
198 #define TRANSLATE_IF_NO_PRODUCT   32
199 #define ALWAYS_TRANSLATE_CDS      48
200 
201 The same set of flags also apply to transcription of mRNA features with far
202 products if the SHOW_TRANCRIPTION flag is also set.
203 
204 #define SHOW_FAR_TRANSCRIPTION    16
205 #define TRANSCRIBE_IF_NO_PRODUCT  32
206 #define ALWAYS_TRANSCRIBE_MRNA    48
207 
208 Any record can be shown with RefSeq policies for exception, source, and
209 other qualifiers, values, and db_xrefs that are not necessarily part of the
210 INSD agreement.
211 
212 #define REFSEQ_CONVENTIONS        64
213 
214 Another 2-bit flag controls where to get features when using far segmented
215 parts or far component delta Bioseqs.
216 
217 #define ONLY_NEAR_FEATURES       128
218 #define FAR_FEATURES_SUPPRESS    256
219 #define NEAR_FEATURES_SUPPRESS   384
220 
221 Other flags allow customization of reports from genomic product sets.
222 
223 #define COPY_GPS_CDS_UP          512
224 #define COPY_GPS_GENE_DOWN      1024
225 
226 The CONTIG block can be shown along with the sequence block in master or
227 segment style, when appropriate.
228 
229 #define SHOW_CONTIG_AND_SEQ     2048
230 
231 mRNAs and peptide features can show /transcription or /peptide sequence
232 qualifiers. This is most useful when generating INSDSeq XML so users do not
233 have to compute on the data themselves.
234 
235 #define SHOW_TRANCRIPTION       4096
236 #define SHOW_PEPTIDE            8192
237 
238 GBSeq XML has been replaced by INSDSeq XML.  The CREATE_XML_GBSEQ_FILE flag
239 will actually produce INSDSeq.  The original GBSeq can be generated during
240 the transition period by adding the following flag.
241 
242 #define PRODUCE_OLD_GBSEQ      16384
243 
244 Still others are expected to be rarely used, or are for testing new features.
245 
246 #define DDBJ_VARIANT_FORMAT    32768
247 #define SPECIAL_GAP_DISPLAY    65536
248 #define FORCE_PRIMARY_BLOCK   131072
249 
250 
251 LOCKS are bits for controlling program performance, and are also ORed
252 together.
253 
254 One flag set is for locking far segmented or delta components, far feature
255 location Bioseqs, or far feature product Bioseqs in advance. This prevents
256 the object manager from uncaching components at an inopportune time,
257 causing unnecessary thrashing. Far component Bioseqs are needed for
258 displaying the sequence.
259 
260 #define LOCK_FAR_COMPONENTS        2
261 #define LOCK_FAR_LOCATIONS         4
262 #define LOCK_FAR_PRODUCTS          8
263 
264 Another set attempts to do bulk accession to gi lookups in advance, which
265 is possible if PubSeqFetchEnable was called by the application. Remote
266 fetching in asn2gb uses this new access mechanism. Far component IDs are
267 needed for the CONTIG line, far location IDs for feature location joins,
268 and far product IDs for the /protein_id and /transcript_id accessions.
269 
270 #define LOOKUP_FAR_COMPONENTS     16
271 #define LOOKUP_FAR_LOCATIONS      32
272 #define LOOKUP_FAR_PRODUCTS       64
273 #define LOOKUP_FAR_HISTORY       128
274 #define LOOKUP_FAR_INFERENCE     256
275 #define LOOKUP_FAR_OTHERS        512
276 
277 To use PubSeqFetchEnable, the application should #include <pmfapi.h>.
278 
279 
280 CUSTOM are bit flags suppressing specific features, and are also ORed
281 together.
282 
283 One set enables display of statistics for features and references.
284 
285 #define SHOW_FEATURE_STATS         1
286 #define SHOW_REFERENCE_STATS       2
287 
288 Another set suppresses common feature types or all features.
289 
290 #define HIDE_FEATURES              4
291 
292 #define HIDE_IMP_FEATS             8
293 #define HIDE_VARS_AND_REPT_REGNS  16
294 #define HIDE_SITES_BONDS_REGIONS  32
295 #define HIDE_CDD_FEATS            64
296 #define HIDE_CDS_PROD_FEATS      128
297 
298 A 3-bit flag controls selective display of GeneRIF references or review
299 articles in RefSeq records.
300 
301 #define HIDE_GENE_RIFS           256
302 #define ONLY_GENE_RIFS           512
303 #define ONLY_REVIEW_PUBS         768
304 #define NEWEST_PUBS             1024
305 #define OLDEST_PUBS             1280
306 #define HIDE_ALL_PUBS           1792
307 
308 Protein feature tables and References in feature tables can also be shown.
309 
310 #define SHOW_PROT_FTABLE        2048
311 #define SHOW_FTABLE_REFS        4096
312 
313 Source features, instantiated Gap features, and the sequence itself can
314 also be suppressed.
315 
316 #define HIDE_SOURCE_FEATS       8192
317 #define HIDE_GAP_FEATS         16384
318 #define HIDE_SEQUENCE          32768
319 
320 Gaps in far delta sequences in Web Entrez are normally converted to a
321 shorthand notation. These can be forced to expand to runs of Ns.
322 
323 #define EXPANDED_GAP_DISPLAY   65536
324 
325 Gene Ontology terms can be suppressed if desired.
326 
327 #define HIDE_GO_TERMS         131072
328 
329 The CDS /translation can also be suppressed, even with near products.
330 
331 #define HIDE_TRANSLATION      262144
332 
333 Evidence qualifiers, including experiment and inference, can be suppressed.
334 
335 #define HIDE_EVIDENCE_QUALS   524288
336 
337 
338 EXTRA is an opaque pointer used for preparing internal NCBI indices.  Most
339 programs will pass NULL for this parameter.
340 
341 
342 SAMPLE GENBANK FLATFILE
343 
344 A sample genomic sequence encoding a spliced mRNA is shown below in GenBank
345 format. The exon features in the original record have been removed from
346 this example.
347 
348 LOCUS       AF012431                2141 bp    DNA     linear   ROD 07-FEB-2000
349 DEFINITION  Mus musculus D-dopachrome tautomerase (Ddt) gene, complete cds.
350 ACCESSION   AF012431
351 VERSION     AF012431.1  GI:2352907
352 KEYWORDS    .
353 SOURCE      Mus musculus (house mouse)
354   ORGANISM  Mus musculus
355             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
356             Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
357             Sciurognathi; Muridae; Murinae; Mus.
358 REFERENCE   1  (bases 1 to 2141)
359   AUTHORS   Esumi,N., Budarf,M., Ciccarelli,L., Sellinger,B., Kozak,C.A. and
360             Wistow,G.
361   TITLE     Conserved gene structure and genomic linkage for D-dopachrome
362             tautomerase (DDT) and MIF
363   JOURNAL   Mamm. Genome 9 (9), 753-757 (1998)
364    PUBMED   9716662
365 REFERENCE   2  (bases 1 to 2141)
366   AUTHORS   Esumi,N. and Wistow,G.
367   TITLE     Direct Submission
368   JOURNAL   Submitted (03-JUL-1997) Molecular Structure and Function, NEI,
369             Building 6, Rm. 331, NIH, Bethesda, MD 20892, USA
370 FEATURES             Location/Qualifiers
371      source          1..2141
372                      /organism="Mus musculus"
373                      /mol_type="genomic DNA"
374                      /db_xref="taxon:10090"
375                      /chromosome="10"
376      gene            1..2141
377                      /gene="Ddt"
378      mRNA            join(1..159,462..637,1868..2141)
379                      /gene="Ddt"
380                      /product="D-dopachrome tautomerase"
381      CDS             join(52..159,462..637,1868..1940)
382                      /gene="Ddt"
383                      /note="related to macrophage migration inhibitory factor
384                      (MIF); in vitro activity on D-dopachrome"
385                      /codon_start=1
386                      /product="D-dopachrome tautomerase"
387                      /protein_id="AAC77467.1"
388                      /db_xref="GI:2352908"
389                      /translation="MPFVELETNLPASRIPAGLENRLCAATATILDKPEDRVSVTIRP
390                      GMTLLMNKSTEPCAHLLVSSIGVVGTAEQNRTHSASFFKFLTEELSLDQDRIVIRFFP
391                      LEAWQIGKKGTVMTFL"
392 BASE COUNT      473 a    567 c    570 g    531 t
393 ORIGIN
394         1 agctcacccg gtgcagttac cgtttggcga tcccactctt ctcccgctaa catgccattc
395        61 gttgagttgg aaacaaactt gccggctagc cgcatacccg cggggctgga gaaccggctg
396       121 tgtgcggcca cagccaccat cctggacaaa cccgaagacg tgagtgaggg tcggcgagaa
397       181 cttgtgggct agggtcggac ctcccaatga cccgttccca tccccaggga ccccactccc
398       241 ctggtaacct ctgaccttcc gtgtcctatc ctcccttcct agatcccttc ctggttgtct
399       301 ttcccaggcg tgaccctgac gtgactgact cccaaggatc ctgggcagtg tcccagaccc
400       361 ggagccctcg gacccccacg ttaagaattt ggcgcgcccc cttctgaaca ccagccatgc
401       421 cctgcccaag cttcaggatt taacttttgt gttccttgca gcgcgtgagc gttacgatac
402       481 gacctggcat gaccctgttg atgaacaaat ccacagagcc ttgtgctcac cttctggtct
403       541 cttccatcgg ggttgtgggc accgcggagc agaaccgcac tcacagcgcc agcttcttca
404       601 agttcctcac cgaggagctg tccctggacc aggaccggta tgcagggcca gtgagggaac
405       661 gtatttgtgc gtctggagtc aggactcagt ctctctgtat gaggttgggg ggggggaggg
406       721 gtcactattt gctggttcca gaaagcactc agtgtccttg tccacgaagg tggactcctc
407       781 aggcactgga atggtgagtc tgtgatcaga atgatagcaa gatttcaatt ccttcgactc
408       841 tctacagccc cgagaaagga tggtttggga agccccagtg ttgtcttgtg tgtactgaga
409       901 atctacttag gcaccctctt aaccactgtg atagtggcct cctcaccgtc actgaaccag
410       961 ggggtctggt tttttaaggg agaacttttc caggctggtc cgagggaatc tggttgtgtc
411      1021 ctgaggcaga taacctttga actagataag gctccgggag agttgctgga tgataaaaag
412      1081 acctccccca caaggtgacc ctaccctccc ccctccccat ccttacattc tgaggcagag
413      1141 ttagagtctc atattcctga ggctggagcg ggcctgtgaa gaactacgga gataagtttg
414      1201 aaagagcctt ccaaaatgga gtcctagtgg gctcaggaaa gttggtattg gctgcttttg
415      1261 ttggatgctc aaatgctgtc ctttagttga ggggacaata cttcttaacg gtaatgctcg
416      1321 tgcacacagc acagggcaga tttggtagct tcctgacata gataactgta ttgggccagt
417      1381 tttacagatg gaaacctgag ggtgtcagcc ctgtgcacaa ccaccctggt gccagacgat
418      1441 cgccagggac ttcctctgag tcctgtgatt gagcaattgc tgattcccac agatttgaat
419      1501 cagatttgaa cctgcgcctc acttagagct gggctttggt tcaaaactaa gtgcctggta
420      1561 ccctgggcac gcctttagga gcatgcagtt agttagaagc agggggactg tttgttagcc
421      1621 cgtaagcagc ctaacatgct cacctgagca cagagcacag gtattgaagc cattgcgtta
422      1681 agtctgcact gggaccggta tagccatcac ctttcttctg acttgtcttt ggtgcaagga
423      1741 tcattagctg gggtgggcag attggcaaaa tatcctgcag gctgatatgg gctggcctgt
424      1801 ctggcaggga ccttaacaaa tgaggggtgt atgcaggagt tgacatctct ccttcttcct
425      1861 cctaaaggat cgttatccgc ttcttcccct tggaggcttg gcagatcgga aagaaaggaa
426      1921 ctgtcatgac atttctgtga cggaaacaaa gaacccaggg tgtttgctcg aaccgggcca
427      1981 gagcccttcc agagaggccc tcccggcaga atcgtggcct ggtagatagg atggtaaatc
428      2041 cctcttttgc ctaaacgtct gcgacttcag tggtccattt ttctcttccc cagcctcgtg
429      2101 aataattgaa agagagcaaa taaatgaaga gaatatcatt c
430 //
431 
432 
433 SAMPLE INSDSET XML
434 
435 The same record is shown in INSDSet XML format. INSDSeq XML is a data
436 distribution format meant to be read by a computer, not a display format
437 intended for human reading, so sequence letters are single strings of
438 characters with no spaces or newlines. (The sequences and other long lines
439 are word-wrapped here only for printing.)
440 
441 The INSDFeature_location is the string displayed exactly as it was in the
442 GenBank flatfile.
443 
444   join(1..159,462..637,1868..2141)
445 
446 For the convenience of users who wish to compute on features without having
447 to parse these string, the individual feature intervals are also presented
448 individually.
449 
450         <INSDFeature_intervals>
451           <INSDInterval>
452             <INSDInterval_from>1</INSDInterval_from>
453             <INSDInterval_to>159</INSDInterval_to>
454             <INSDInterval_accession>AF012431.1</INSDInterval_accession>
455           </INSDInterval>
456           <INSDInterval>
457             <INSDInterval_from>462</INSDInterval_from>
458             <INSDInterval_to>637</INSDInterval_to>
459             <INSDInterval_accession>AF012431.1</INSDInterval_accession>
460           </INSDInterval>
461           <INSDInterval>
462           ...
463 
464 The record here was generated with the SHOW_TRANCRIPTION flag set to
465 extract the bases under the mRNA feature interval and display them in a
466 transcription qualifier. This eliminates the need to process feature
467 intervals for the common task of obtaining the mRNA bases. SHOW_PEPTIDE
468 does the same for extracting peptide sequences from under sig_peptide or
469 mat_peptide features. The transcription and peptide qualifiers are
470 extensions of those approved by the INSD for official releases.
471 
472 <?xml version="1.0"?>
473 <!DOCTYPE INSDSet PUBLIC "-//NCBI//INSD INSDSeq/EN"
474 "http://www.ncbi.nlm.nih.gov/INSD_INSDSeq.dtd">
475 <INSDSet>
476   <INSDSeq>
477     <INSDSeq_locus>AF012431</INSDSeq_locus>
478     <INSDSeq_length>2141</INSDSeq_length>
479     <INSDSeq_moltype>DNA</INSDSeq_moltype>
480     <INSDSeq_topology>linear</INSDSeq_topology>
481     <INSDSeq_division>ROD</INSDSeq_division>
482     <INSDSeq_update-date>07-FEB-2000</INSDSeq_update-date>
483     <INSDSeq_create-date>03-SEP-1997</INSDSeq_create-date>
484     <INSDSeq_definition>Mus musculus D-dopachrome tautomerase (Ddt) gene,
485 complete cds</INSDSeq_definition>
486     <INSDSeq_primary-accession>AF012431</INSDSeq_primary-accession>
487     <INSDSeq_accession-version>AF012431.1</INSDSeq_accession-version>
488     <INSDSeq_other-seqids>
489       <INSDSeqid>gb|AF012431.1|AF012431</INSDSeqid>
490       <INSDSeqid>gi|2352907</INSDSeqid>
491     </INSDSeq_other-seqids>
492     <INSDSeq_source>Mus musculus (house mouse)</INSDSeq_source>
493     <INSDSeq_organism>Mus musculus</INSDSeq_organism>
494     <INSDSeq_taxonomy>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
495 Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
496 Sciurognathi; Muridae; Murinae; Mus</INSDSeq_taxonomy>
497     <INSDSeq_references>
498       <INSDReference>
499         <INSDReference_reference>1 (bases 1 to 2141)</INSDReference_reference>
500         <INSDReference_authors>
501           <INSDAuthor>Esumi,N.</INSDAuthor>
502           <INSDAuthor>Budarf,M.</INSDAuthor>
503           <INSDAuthor>Ciccarelli,L.</INSDAuthor>
504           <INSDAuthor>Sellinger,B.</INSDAuthor>
505           <INSDAuthor>Kozak,C.A.</INSDAuthor>
506           <INSDAuthor>Wistow,G.</INSDAuthor>
507         </INSDReference_authors>
508         <INSDReference_title>Conserved gene structure and genomic linkage for
509 D-dopachrome tautomerase (DDT) and MIF</INSDReference_title>
510         <INSDReference_journal>Mamm. Genome 9 (9), 753-757 (1998)
511 </INSDReference_journal>
512         <INSDReference_pubmed>9716662</INSDReference_pubmed>
513       </INSDReference>
514       <INSDReference>
515         <INSDReference_reference>2 (bases 1 to 2141)</INSDReference_reference>
516         <INSDReference_authors>
517           <INSDAuthor>Esumi,N.</INSDAuthor>
518           <INSDAuthor>Wistow,G.</INSDAuthor>
519         </INSDReference_authors>
520         <INSDReference_title>Direct Submission</INSDReference_title>
521         <INSDReference_journal>Submitted (03-JUL-1997) Molecular Structure and
522 Function, NEI, Building 6, Rm. 331, NIH, Bethesda, MD 20892, USA
523 </INSDReference_journal>
524       </INSDReference>
525     </INSDSeq_references>
526     <INSDSeq_feature-table>
527       <INSDFeature>
528         <INSDFeature_key>source</INSDFeature_key>
529         <INSDFeature_location>1..2141</INSDFeature_location>
530         <INSDFeature_quals>
531           <INSDQualifier>
532             <INSDQualifier_name>organism</INSDQualifier_name>
533             <INSDQualifier_value>Mus musculus</INSDQualifier_value>
534           </INSDQualifier>
535           <INSDQualifier>
536             <INSDQualifier_name>mol_type</INSDQualifier_name>
537             <INSDQualifier_value>genomic DNA</INSDQualifier_value>
538           </INSDQualifier>
539           <INSDQualifier>
540             <INSDQualifier_name>db_xref</INSDQualifier_name>
541             <INSDQualifier_value>taxon:10090</INSDQualifier_value>
542           </INSDQualifier>
543           <INSDQualifier>
544             <INSDQualifier_name>chromosome</INSDQualifier_name>
545             <INSDQualifier_value>10</INSDQualifier_value>
546           </INSDQualifier>
547         </INSDFeature_quals>
548       </INSDFeature>
549       <INSDFeature>
550         <INSDFeature_key>gene</INSDFeature_key>
551         <INSDFeature_location>1..2141</INSDFeature_location>
552         <INSDFeature_intervals>
553           <INSDInterval>
554             <INSDInterval_from>1</INSDInterval_from>
555             <INSDInterval_to>2141</INSDInterval_to>
556             <INSDInterval_accession>AF012431.1</INSDInterval_accession>
557           </INSDInterval>
558         </INSDFeature_intervals>
559         <INSDFeature_quals>
560           <INSDQualifier>
561             <INSDQualifier_name>gene</INSDQualifier_name>
562             <INSDQualifier_value>Ddt</INSDQualifier_value>
563           </INSDQualifier>
564         </INSDFeature_quals>
565       </INSDFeature>
566       <INSDFeature>
567         <INSDFeature_key>mRNA</INSDFeature_key>
568         <INSDFeature_location>join(1..159,462..637,1868..2141)
569 </INSDFeature_location>
570         <INSDFeature_intervals>
571           <INSDInterval>
572             <INSDInterval_from>1</INSDInterval_from>
573             <INSDInterval_to>159</INSDInterval_to>
574             <INSDInterval_accession>AF012431.1</INSDInterval_accession>
575           </INSDInterval>
576           <INSDInterval>
577             <INSDInterval_from>462</INSDInterval_from>
578             <INSDInterval_to>637</INSDInterval_to>
579             <INSDInterval_accession>AF012431.1</INSDInterval_accession>
580           </INSDInterval>
581           <INSDInterval>
582             <INSDInterval_from>1868</INSDInterval_from>
583             <INSDInterval_to>2141</INSDInterval_to>
584             <INSDInterval_accession>AF012431.1</INSDInterval_accession>
585           </INSDInterval>
586         </INSDFeature_intervals>
587         <INSDFeature_quals>
588           <INSDQualifier>
589             <INSDQualifier_name>gene</INSDQualifier_name>
590             <INSDQualifier_value>Ddt</INSDQualifier_value>
591           </INSDQualifier>
592           <INSDQualifier>
593             <INSDQualifier_name>product</INSDQualifier_name>
594             <INSDQualifier_value>D-dopachrome tautomerase</INSDQualifier_value>
595           </INSDQualifier>
596           <INSDQualifier>
597             <INSDQualifier_name>transcription</INSDQualifier_name>
598             <INSDQualifier_value>AGCTCACCCGGTGCAGTTACCGTTTGGCGATCCCACTCTTCT
599 CCCGCTAACATGCCATTCGTTGAGTTGGAAACAAACTTGCCGGCTAGCCGCATACCCGCGGGGCTGGAGAACCGG
600 CTGTGTGCGGCCACAGCCACCATCCTGGACAAACCCGAAGACCGCGTGAGCGTTACGATACGACCTGGCATGACC
601 CTGTTGATGAACAAATCCACAGAGCCTTGTGCTCACCTTCTGGTCTCTTCCATCGGGGTTGTGGGCACCGCGGAG
602 CAGAACCGCACTCACAGCGCCAGCTTCTTCAAGTTCCTCACCGAGGAGCTGTCCCTGGACCAGGACCGGATCGTT
603 ATCCGCTTCTTCCCCTTGGAGGCTTGGCAGATCGGAAAGAAAGGAACTGTCATGACATTTCTGTGACGGAAACAA
604 AGAACCCAGGGTGTTTGCTCGAACCGGGCCAGAGCCCTTCCAGAGAGGCCCTCCCGGCAGAATCGTGGCCTGGTA
605 GATAGGATGGTAAATCCCTCTTTTGCCTAAACGTCTGCGACTTCAGTGGTCCATTTTTCTCTTCCCCAGCCTCGT
606 GAATAATTGAAAGAGAGCAAATAAATGAAGAGAATATCATTC</INSDQualifier_value>
607           </INSDQualifier>
608         </INSDFeature_quals>
609       </INSDFeature>
610       <INSDFeature>
611         <INSDFeature_key>CDS</INSDFeature_key>
612         <INSDFeature_location>join(52..159,462..637,1868..1940)
613 </INSDFeature_location>
614         <INSDFeature_intervals>
615           <INSDInterval>
616             <INSDInterval_from>52</INSDInterval_from>
617             <INSDInterval_to>159</INSDInterval_to>
618             <INSDInterval_accession>AF012431.1</INSDInterval_accession>
619           </INSDInterval>
620           <INSDInterval>
621             <INSDInterval_from>462</INSDInterval_from>
622             <INSDInterval_to>637</INSDInterval_to>
623             <INSDInterval_accession>AF012431.1</INSDInterval_accession>
624           </INSDInterval>
625           <INSDInterval>
626             <INSDInterval_from>1868</INSDInterval_from>
627             <INSDInterval_to>1940</INSDInterval_to>
628             <INSDInterval_accession>AF012431.1</INSDInterval_accession>
629           </INSDInterval>
630         </INSDFeature_intervals>
631         <INSDFeature_quals>
632           <INSDQualifier>
633             <INSDQualifier_name>gene</INSDQualifier_name>
634             <INSDQualifier_value>Ddt</INSDQualifier_value>
635           </INSDQualifier>
636           <INSDQualifier>
637             <INSDQualifier_name>note</INSDQualifier_name>
638             <INSDQualifier_value>related to macrophage migration inhibitory
639 factor (MIF); in vitro activity on D-dopachrome</INSDQualifier_value>
640           </INSDQualifier>
641           <INSDQualifier>
642             <INSDQualifier_name>codon_start</INSDQualifier_name>
643             <INSDQualifier_value>1</INSDQualifier_value>
644           </INSDQualifier>
645           <INSDQualifier>
646             <INSDQualifier_name>transl_table</INSDQualifier_name>
647             <INSDQualifier_value>1</INSDQualifier_value>
648           </INSDQualifier>
649           <INSDQualifier>
650             <INSDQualifier_name>product</INSDQualifier_name>
651             <INSDQualifier_value>D-dopachrome tautomerase</INSDQualifier_value>
652           </INSDQualifier>
653           <INSDQualifier>
654             <INSDQualifier_name>protein_id</INSDQualifier_name>
655             <INSDQualifier_value>AAC77467.1</INSDQualifier_value>
656           </INSDQualifier>
657           <INSDQualifier>
658             <INSDQualifier_name>db_xref</INSDQualifier_name>
659             <INSDQualifier_value>GI:2352908</INSDQualifier_value>
660           </INSDQualifier>
661           <INSDQualifier>
662             <INSDQualifier_name>translation</INSDQualifier_name>
663             <INSDQualifier_value>MPFVELETNLPASRIPAGLENRLCAATATILDKPEDRVSVTI
664 RPGMTLLMNKSTEPCAHLLVSSIGVVGTAEQNRTHSASFFKFLTEELSLDQDRIVIRFFPLEAWQIGKKGTVMTF
665 L</INSDQualifier_value>
666           </INSDQualifier>
667         </INSDFeature_quals>
668       </INSDFeature>
669     </INSDSeq_feature-table>
670     <INSDSeq_sequence>AGCTCACCCGGTGCAGTTACCGTTTGGCGATCCCACTCTTCTCCCGCTAACAT
671 GCCATTCGTTGAGTTGGAAACAAACTTGCCGGCTAGCCGCATACCCGCGGGGCTGGAGAACCGGCTGTGTGCGGC
672 CACAGCCACCATCCTGGACAAACCCGAAGACGTGAGTGAGGGTCGGCGAGAACTTGTGGGCTAGGGTCGGACCTC
673 CCAATGACCCGTTCCCATCCCCAGGGACCCCACTCCCCTGGTAACCTCTGACCTTCCGTGTCCTATCCTCCCTTC
674 CTAGATCCCTTCCTGGTTGTCTTTCCCAGGCGTGACCCTGACGTGACTGACTCCCAAGGATCCTGGGCAGTGTCC
675 CAGACCCGGAGCCCTCGGACCCCCACGTTAAGAATTTGGCGCGCCCCCTTCTGAACACCAGCCATGCCCTGCCCA
676 AGCTTCAGGATTTAACTTTTGTGTTCCTTGCAGCGCGTGAGCGTTACGATACGACCTGGCATGACCCTGTTGATG
677 AACAAATCCACAGAGCCTTGTGCTCACCTTCTGGTCTCTTCCATCGGGGTTGTGGGCACCGCGGAGCAGAACCGC
678 ACTCACAGCGCCAGCTTCTTCAAGTTCCTCACCGAGGAGCTGTCCCTGGACCAGGACCGGTATGCAGGGCCAGTG
679 AGGGAACGTATTTGTGCGTCTGGAGTCAGGACTCAGTCTCTCTGTATGAGGTTGGGGGGGGGGAGGGGTCACTAT
680 TTGCTGGTTCCAGAAAGCACTCAGTGTCCTTGTCCACGAAGGTGGACTCCTCAGGCACTGGAATGGTGAGTCTGT
681 GATCAGAATGATAGCAAGATTTCAATTCCTTCGACTCTCTACAGCCCCGAGAAAGGATGGTTTGGGAAGCCCCAG
682 TGTTGTCTTGTGTGTACTGAGAATCTACTTAGGCACCCTCTTAACCACTGTGATAGTGGCCTCCTCACCGTCACT
683 GAACCAGGGGGTCTGGTTTTTTAAGGGAGAACTTTTCCAGGCTGGTCCGAGGGAATCTGGTTGTGTCCTGAGGCA
684 GATAACCTTTGAACTAGATAAGGCTCCGGGAGAGTTGCTGGATGATAAAAAGACCTCCCCCACAAGGTGACCCTA
685 CCCTCCCCCCTCCCCATCCTTACATTCTGAGGCAGAGTTAGAGTCTCATATTCCTGAGGCTGGAGCGGGCCTGTG
686 AAGAACTACGGAGATAAGTTTGAAAGAGCCTTCCAAAATGGAGTCCTAGTGGGCTCAGGAAAGTTGGTATTGGCT
687 GCTTTTGTTGGATGCTCAAATGCTGTCCTTTAGTTGAGGGGACAATACTTCTTAACGGTAATGCTCGTGCACACA
688 GCACAGGGCAGATTTGGTAGCTTCCTGACATAGATAACTGTATTGGGCCAGTTTTACAGATGGAAACCTGAGGGT
689 GTCAGCCCTGTGCACAACCACCCTGGTGCCAGACGATCGCCAGGGACTTCCTCTGAGTCCTGTGATTGAGCAATT
690 GCTGATTCCCACAGATTTGAATCAGATTTGAACCTGCGCCTCACTTAGAGCTGGGCTTTGGTTCAAAACTAAGTG
691 CCTGGTACCCTGGGCACGCCTTTAGGAGCATGCAGTTAGTTAGAAGCAGGGGGACTGTTTGTTAGCCCGTAAGCA
692 GCCTAACATGCTCACCTGAGCACAGAGCACAGGTATTGAAGCCATTGCGTTAAGTCTGCACTGGGACCGGTATAG
693 CCATCACCTTTCTTCTGACTTGTCTTTGGTGCAAGGATCATTAGCTGGGGTGGGCAGATTGGCAAAATATCCTGC
694 AGGCTGATATGGGCTGGCCTGTCTGGCAGGGACCTTAACAAATGAGGGGTGTATGCAGGAGTTGACATCTCTCCT
695 TCTTCCTCCTAAAGGATCGTTATCCGCTTCTTCCCCTTGGAGGCTTGGCAGATCGGAAAGAAAGGAACTGTCATG
696 ACATTTCTGTGACGGAAACAAAGAACCCAGGGTGTTTGCTCGAACCGGGCCAGAGCCCTTCCAGAGAGGCCCTCC
697 CGGCAGAATCGTGGCCTGGTAGATAGGATGGTAAATCCCTCTTTTGCCTAAACGTCTGCGACTTCAGTGGTCCAT
698 TTTTCTCTTCCCCAGCCTCGTGAATAATTGAAAGAGAGCAAATAAATGAAGAGAATATCATTC
699 </INSDSeq_sequence>
700   </INSDSeq>
701 </INSDSet>
702 

source navigation ]   [ diff markup ]   [ identifier search ]   [ freetext search ]   [ file search ]  

This page was automatically generated by the LXR engine.
Visit the LXR main site for more information.