Eukaryotic Genome Submission Guide

Eukaryotic Genome Submission Examples

Figure 1: Sample FASTA-formatted sequence

>HTE831 [organism=Drosophila yakuba] [strain=HTE831]
tagagcaaaaaatagacattttaatggcgctaatcatacaaggaaggaataataacactg
acatggatacatccacttaatctacatttgcttattcctatcttgactatatctatatcc
[etc.]

Figure 2: Feature table format

This mock example of a feature table file includes:

Note that the relative order of the features in the file does not matter, and that the misc_feature and repeat_region features do not have a corresponding gene feature, and so do not have a locus_tag.

See the flatfile view of this file in Figure 3.

>Feature HTE831
63574	87173	gene
			locus_tag		Ngs_17131 
63574	63907	mRNA
75690	75730
84396	85536
85598	85773
85836	86109
86173	86467
86555	86670
86731	87173
			product  hypothetical protein
			protein_id	gnl|ncbi|Ngs_17131
			transcript_id	gnl|ncbi|Ngs_mrna17131
84402	85536	CDS
85598	85773
85836	86109
86173	86467
86555	86670
86731	86882
			product	hypothetical protein
			protein_id	gnl|ncbi|Ngs_17131
			transcript_id	gnl|ncbi|Ngs_mrna17131
			inference	similar to RNA sequence, mRNA:INSD:AY123455.2 
102664	100872	gene
			locus_tag		Ngs_3038
			gene	TpnI 
102664	102502	mRNA
102400	102234
102168	100872
			product	troponin isoform B
			protein_id	gnl|ncbi|Ngs_3038B
			transcript_id	gnl|ncbi|Ngs_mrna3038B
			note	transcript variant B; alternatively spliced
102655	102234	mRNA
102168	100872
			product	troponin isoform A
			protein_id	gnl|ncbi|Ngs_3038A
			transcript_id	gnl|ncbi|Ngs_mrna3038A
			note	transcript variant A; alternatively spliced
102503	102502	CDS
102400	102234
102168	101261
			product	troponin isoform B
			protein_id	gnl|ncbi|Ngs_3038B 
			transcript_id	gnl|ncbi|Ngs_mrna3038B
			note	encoded by transcript variant B; alternatively spliced
102492	102234	CDS
102168	101261
			product	troponin isoform A
			protein_id	gnl|ncbi|Ngs_3038A
			transcript_id	gnl|ncbi|Ngs_mrna3038A
			note	encoded by transcript variant A; alternatively spliced
<112616	>115107	gene	
			locus_tag		Ngs_2945 
<112616	112646	mRNA
112703	113463
113584	113762
113821	114249
114302	114464
114804	114902
114964	>115107
			product	bifunctional methylenetetrahydrofolate dehydrogenase (NADP+)/methenyltetrahydrofolate cyclohydrolase
			protein_id	gnl|ncbi|Ngs_2945
			transcript_id	gnl|ncbi|Ngs_mrna2945
112616	112646	CDS
112703	113463
113584	113762
113821	114249
114302	114464
114804	114902
114964	115107
			product	bifunctional methylenetetrahydrofolate dehydrogenase (NADP+)/methenyltetrahydrofolate cyclohydrolase
			EC_number		1.5.1.5
			EC_number		3.5.4.9
			note	bifunctional
			experiment	Western blot
			protein_id	gnl|ncbi|Ngs_2945
			transcript_id	gnl|ncbi|Ngs_mrna2945
101	180	gene
			locus_tag		Ngs_10111
			gene	trnL 
101	180	tRNA
			product	Leu 
45111	45190	gene
			locus_tag		Ngs_10112
			pseudo 
45111	45190	tRNA
			product	Xxx 
2103	400	gene
			locus_tag		Ngs_11232 
2103	400	rRNA
			product	18S ribosomal RNA 
60101	60567	misc_feature
			note	similar to ABC transporters 
43027	43136	repeat_region
			mobile_element	retrotransposon:mini-me-Dpse-like{}4773
56408	56558	repeat_region
			mobile_element	retrotransposon:INE-1{}4674
62077	62147	repeat_region
			mobile_element	retrotransposon:P-T-Damb-like{}4769
63111	63154	repeat_region
			note	at-rich

 

Figure 3: GenBank flatfile

This is part of the flatfile view of the .sqn file made from the .fsa file (Fig. 1) and .tbl file (Fig. 2).

     source          1..116100
                     /organism="Drosophila yakuba"
                     /mol_type="genomic DNA"
                     /strain="HTE831"
                     /db_xref="taxon:7245"
    gene            101..180
                     /gene="trnL"
                     /locus_tag="Ngs_10111" 
     tRNA            101..180
                     /gene="trnL"
                     /locus_tag="Ngs_10111"
                     /product="tRNA-Leu" 
     gene            complement(400..2103)
                     /locus_tag="Ngs_11232" 
     rRNA            complement(400..2103)
                     /locus_tag="Ngs_11232"
                     /product="18S ribosomal RNA" 
     repeat_region   43027..43136
                     /mobile_element="retrotransposon:mini-me-Dpse-like{}4773"
      gene            45111..45190
                     /locus_tag="Ngs_10112"
                     /pseudo 
     tRNA            45111..45190
                     /locus_tag="Ngs_10112"
                     /product="tRNA-OTHER" 
                     /pseudo 
     repeat_region   56408..56558
                     /mobile_element="retrotransposon:INE-1{}4674"
     misc_feature    60101..60567
                     /note="similar to ABC transporters" 
     repeat_region   62077..62147
                     /mobile_element="retrotransposon:P-T-Damb-like{}4769"
     repeat_region   63111..63154
                     /note="at-rich"
    gene            63574..87173
                     /locus_tag="Ngs_17131" 
     mRNA            join(63574..63907,75690..75730,84396..85536,85598..85773,
                     85836..86109,86173..86467,86555..86670,86731..87173)
                     /locus_tag="Ngs_17131"
                     /product="hypothetical protein" 
     CDS             join(84402..85536,85598..85773,85836..86109,86173..86467,
                     86555..86670,86731..86882)
                     /locus_tag="Ngs_17131"
                     /inference="similar to RNA sequence, mRNA:INSD:AY123455.2"
                     /codon_start=1
                     /product="hypothetical protein"
                     /translation="MQSTQSKSDRSSMHRGPLLLCAVMVVLVTLPEQINARMAFEKLT
                     DFDFPGNTYYSVKNLSLYECQGWCREEADCQAAAFSFVVNPLSPSQETHCQLQNDSSA
                     ANPSAAPQRSANMYYMIKLQLRSENVCHRPWSFERVPNKVIRGLDNALIYTSTKEACL
                     SACLNERRFVCRSVEYDYNNMKCVLSDSDRRSSGQFVQLVDAQGTDYFENLCLKPAQA
                     CKNNRSFGNSQKMGVSEEKVAQYVGLHYYTDKELQVTSESACRLACEIESEFLCRSFL
                     YLGQPQGSQYNCRLYHLDHKTLPDGPSTYLNHERPLIDHGEPIGQYFENQCEKAAGLG
                     AGSPPGTLDKIDTLPVSLDTIEDPNLTNLTRNDVNCDKTGTCYDVSVHCKDTRIAVQV
                     RTNKPFNGRIYALGRSETCNIDVINSDAFRLDLTMAGQDCNTQSVTGVYSNTVVLQHH
                     SVVMTKADKIYKVKCTYDMSSKNITFGMMPIRDPEMIHINSSPEAPPPRIRILDTRQR
                     EVETVRIGDRLNFRIEIPEDTPYGIFARSCVAMAKDARTSFKIIDDDGCPTDPTIFPG
                     FTADGNALQSTYEAFRFTESYGVIFQCNVKYCLGPCEPAVCEWNMDSFESLGRRRRRS
                     IESNDTKSEDDMNISQEILVLDFGDEKREFFKADPSTDFAKDKTVTIIEPCPTKTSVL
                     ALAVTCALMILLYISTLFCYYMKKWMQPHKIVA" 
      gene            complement(100872..102664)
                     /gene="TpnI"
                     /locus_tag="Ngs_3038" 
     mRNA            complement(join(100872..102168,102234..102400,
                     102502..102664))
                     /gene="TpnI"
                     /locus_tag="Ngs_3038"
                     /product="troponin isoform B"
                     /note="transcript variant B; alternatively spliced" 
     mRNA            complement(join(100872..102168,102234..102655))
                     /gene="TpnI"
                     /locus_tag="Ngs_3038"
                     /product="troponin isoform A"
                     /note="transcript variant A; alternatively spliced" 
    CDS             complement(join(101261..102168,102234..102400,
                     102502..102503))
                     /gene="TpnI"
                     /locus_tag="Ngs_3038"
                     /note="encoded by transcript variant B; alternatively spliced"
                     /codon_start=1
                     /product="troponin isoform B"
                     /translation="MDSSQSRKNGFLLHLPLETKRNPSNPNTPLSNLLNLTDFHYLLA
                     SNVCRKAKRELLAVLIVTSYAGHDALRSAHRQAIPQSKLEEMGLRRVFLLAALPSREH
                     FISQDQLASEQNRFGDLLQGNFIEDYRNLSYKHVMGLKWVSEECKKQAKFIIKLDDDI
                     IYDVFHLRRYLETLEVREPGLATSSTLLSGYVLDAKPPIRLRANKWYVSKKEYPQALY
                     PAYLSGWLYVTNVPTAERIVAEAERMSFFWIDDTWLTGVVRTRLGIPLERHNDWFSAN
                     AEFIDCCVRDLKKHNYECEYSVGPNGGDDRLLVEFLHNVEKCYFDECVKRPVGKSLKE
                     TCLAAAKSRPPKHGFPEIKALRLR" 
     CDS             complement(join(101261..102168,102234..102492))
                     /gene="TpnI"
                     /locus_tag="Ngs_3038"
                     /note="encoded by transcript variant A; alternatively spliced"
                     /codon_start=1
                     /product="troponin isoform A"
                     /translation="MRMRGRRLLPIILSLLLIVLLSLCYFSNHLRDSSQSRKNGFLLH
                     LPLETKRNPSNPNTPLSNLLNLTDFHYLLASNVCRKAKRELLAVLIVTSYAGHDALRS
                     AHRQAIPQSKLEEMGLRRVFLLAALPSREHFISQDQLASEQNRFGDLLQGNFIEDYRN
                     LSYKHVMGLKWVSEECKKQAKFIIKLDDDIIYDVFHLRRYLETLEVREPGLATSSTLL
                     SGYVLDAKPPIRLRANKWYVSKKEYPQALYPAYLSGWLYVTNVPTAERIVAEAERMSF
                     FWIDDTWLTGVVRTRLGIPLERHNDWFSANAEFIDCCVRDLKKHNYECEYSVGPNGGD
                     DRLLVEFLHNVEKCYFDECVKRPVGKSLKETCLAAAKSRPPKHGFPEIKALRLR" 
    gene            <112616..>115107
                     /locus_tag="Ngs_2945" 
     mRNA            join(<112616..112646,112703..113463,113584..113762,
                     113821..114249,114302..114464,114804..114902,
                     114964..>115107)
                     /locus_tag="Ngs_2945"
                     /product="bifunctional methylenetetrahydrofolate dehydrogenase (NADP+)/methenyltetrahydrofolate cyclohydrolase" 
     CDS             join(112616..112646,112703..113463,113584..113762,
                     113821..114249,114302..114464,114804..114902,
                     114964..115107)
                     /locus_tag="Ngs_2945"
                     /EC_number="3.5.4.9"
                     /EC_number="1.5.1.5"
                     /experiment="Western blot"
                     /codon_start=1
                     /product="bifunctional methylenetetrahydrofolate dehydrogenase (NADP+)/methenyltetrahydrofolate cyclohydrolase"
                     /translation="MESITFGVLTISDTCWQEPEKDTSGPILRQLIGETFANTQVIGN
                     IVPDEKDIIQQELRKWIDREELRVILTTGGTGFAPRDVTPEATRQLLEKECPQLSMYI
                     TLESIKQTQYAALSRGLCGIAGNTLILNLPGSEKAVKECFQTISALLPHAVHLIGDDV
                     SLVRKTHAEVQGSAQKSHICPHKTGTGTDSDRNSPYPMLPVQEVLSIIFNTVQKTANL
                     NKILLEMNAPVNIPPFRASIKDGYAMKSTGFSGTKRVLGCIAAGDSPNSLPLAEDECY
                     KINTGAPLPLEADCVVQVEDTKLLQLDKNGQESLVDILVEPQAGLDVRPVGYDLSTND
                     RIFPALDPSPVVVKSLLASVGNRLILSKPKVAIVSTGSELCSPRNQLTPGKIFDSNTT
                     MLTELLVYFGFNCMHTCVLSDSFQRTKESLLELFEVVDFVICSGGVSMGDKDFVKSVL
                     EDLQFRIHCGRVNIKPGKPMTFASRKDKYFFGLPGNPVSAFVTFHLFALPAIRFAAGW
                     DRCKCSLSVLNVKLLNDFSLDSRPEFVRASVISKSGELYASVNGNQISSRLQSIVGAD
                     VLINLPARTSDRPLAKAGEIFPASVLRFDFISKYE" 
ORIGIN
        1 tagagcaaaa aatagacatt ttaatggcgc taatcataca aggaaggaat aataacactg
       61 acatggatac atccacttaa tctacatttg cttattccta tcttgactat atctatatcc
       [etc.]



Revised October 16, 2007

Submission Instructions

Links

Genomes