NCBI logo WGS Example Files  
PubMed Entrez BLAST OMIM Books Taxonomy Structure

NCBI
back to NCBI homepage
back to NCBI homepage
SITE MAP


GenBank
Sequence submission support and software

Sequin
Stand-alone sequence submission tool

WGS
Whole Genome Shotgun Projects

tbl2asn
Command line sequence submission tool

Example Files
Sample .fsa and .tbl files

 

WGS File Examples

Remember that the columns in a .tbl file must be tab-delimited. If the samples in which the complete sequence is included do not work, check that tabs separate the columns, not spaces.

  • Simple file
  • Multiple sequences in a file
  • Partial coding regions
  • Features on the complementary strand

  • Simple file
    sample.fsa
    
    >Cont54 [organism=Homo sapiens] [chromosome=5] [tech=wgs]
    acaagcgctgctgtcgatgcaaactttagcttttaaacaagtgcaaacgcacgctgtctc
    acatgataacacacattatcagaatactttccatgcaatatgaaaccatagcaagctacg
    ....
    
    sample.tbl
    
    >Features Cont54
    10400	12512	gene
    			locus_tag	CCC_03116
    10400	10462	mRNA
    10533	10577   
    10651	11098   
    11182	11642   
    11716	12512   
    			product	hypothetical protein
    10450	10462	CDS
    10533	10577   
    10651	11098   
    11182	11642   
    11716	12233   
    			product	hypothetical protein
    			protein_id	gnl|dbname|CCC_03116
    			inference	profile:Genscan:2.0
    15801	17688	gene
    			locus_tag	CCC_03118
    15801	16607	mRNA
    16750	17688
    			product	hypothetical protein
    15840	16607	CDS
    16750	17610   
    			product	hypothetical protein
    			protein_id	gnl|dbname|CCC_03118
    			inference	similar to RNA sequence, mRNA:INSD:AY123456.2
    
    
    
    Here is the definition line of the flatfile view of the final record made
    with these files:
    
    DEFINITION  Homo sapiens chromosome 5 Cont54, whole genome shotgun sequence.
    
    
    Top


     

  • Files for multiple sequences
    
    multiple.fsa
    
    >Cont348.225 [organism=Helicobacter pylori] [strain=xxx] [tech=wgs]
    TTGAAGCAAGGCATTAGGCGAACCACTGCCTCTCTTTTACCTTCTTTTTTTTCCACCATTATTACTTTACTTTACATACGTTTAGGATCTGG
    CGAGCAGCCCAGGCGAGTGTTTTGTAGTTTTCTCGGGGCTGCCTTTTTTTCTCTCTGTGGATGTGTGTGTGGGTATGGGCTGTATTTTCCTG
    >Cont442.125 [organism=Helicobacter pylori] [strain=xxx] [tech=wgs]
    TTGAAGCAAGGCATTAGGCGAACCACTGCCTCTCTTTTACCTTCTTTTTTTTCCACCATTATTACTTTACTTTACATACGTTTAGGATCTGG
    CGAGCAGCCCAGGCGAGTGTTTTGTAGTTTTCTCGGGGCTGCCTTTTTTTCTCTCTGTGGATGTGTGTGTGGGTATGGGCTGTATTTTCCTG
    
    
    multiple.tbl
    
    >Features Cont348.225
    11	109	gene
    			locus_tag	HPC_002564
    11	109	CDS
    			product	HPC_002564
    			protein_id	gnl|dbname|HPC_002564
    >Features Cont442.125
    15	113	gene
    			locus_tag	HPC_003020
    			gene	cheA
    15	113	CDS
    			product	CheA
    			protein_id	gnl|dbname|HPC_003020
    			experiment	Northern blot
    
    Top


     

  • Partial coding region

    The first coding region is partial at the 5' end and nucleotide 3 is the beginning of the first complete codon. Therefore, " < " indicates 5' partial, and codon_start "3" indicates the start of the first codon.

    The second coding region is partial at the 3' end, so " > " is used to indicate 3' partial.

    
    partial.fsa
    
    >Cont3  [organism=Mus musculus] [strain=BALB/c] [chromosome=2] [tech=wgs]
    TGcaaagtGGAATTCCAATTTCAACACCAGTTTTTGATGGCGCAAAAGAGCAAGATGTAACAAATATGTTAGAGCTTGCATCATTACCAAAATCTGG
    TCAAACAAAATTGTGGGATGGTAGAACAGGTGAAAAATTTGATAGAGAAGTCACAGTTGGCACTATTTATATGTTAAAATTACACCATCTTGTAGAA
    GATAAAATACACGCAAGATCTACAGGTCCTTATAGTTTAGTTACACAACAACCTCTTGGTGGTAAGGCTCAATTGGGAGGTCAACGATTTGGAGAAA
    TGGAAGTTTGGGCTCTGGAAGCTTATGGGGCTTCTTATACTTTACAAGAAATTTTAACAGTAAAATCTGATGATGTTGCTGGTAGAGTTAAAGTTTA
    TGAAACAATAGTAAAAGGTGAAGAGAATTTCGAGTCAGGAATACCTGAGTCATTTAATGTTTTAGTAAAAGAAATCAAAGCGCTAGCTCTTAATGTG
    GAGTTAAATTAAAATGAAAAAAGATATTAAAGATTTTTTTAAAGAAACTGCCATATCAGACTCTCAAAATTTTAATAGTATTAAAATTACTTTAGCA
    AGCCCTGAAAAGATAAAGTCATGGACTTATGGAGAAATAAAAAAACCCGAAACTATTAATTATAGAACTTTCAGACCTGAAAAAGACGGCCTATTTT
    GTGCGAGAATATTTGGTCCAATAAAAGATTACGAATGTTTATGTGGAAAATATAAAAGAATGAAGTTCAGAGGAATTATTTGTGAGAAGTGTGGCGT
    AGAGGTTACTAAATCAAATGTTCGTAGAGAAAGAATGGGGCACATCAATTTATCAACCCCAGTTGCACATATTTGGTTTTTAAAATCTTTACCAAGT
    AGAATTTCACTAGCTATTGATATGAAGCTTAAAGAGGTTGAAAGAGTTCTATACTTTGAAAGTTTTATTGTTATAGAGCCTGGATTAACTAGTCTTA
    AAAAAAATCAACTTTTAAACGAAGATGAATTAAATAAATATCAAGAGGAGTTTGGTGAAGAATCCTTTACTGCAGGAATAGGAGCAGAGGCGATACT
    AGAGATTTTAAAATCTATAGACTTGAATAAAGAGAGAGAAATTTTATTAAAAAATATAAATGAGACAAAATCAAAGGTTGCTGAAGAAAGATCTATA
    AAAAGATTAAAACTGATCGATTCATTTATTGAAACTGGTAACAAACCAGAATGGATGATTTTAACTACTATACCTGTAATACCACCAGAGTTAAGGC
    CACTTGTTCCTCTAGATGGAGGTAGATTTGCAACATCAGATCTAAACGATTTGTATAGAAGAGTTATAAATAGAAATAATAGATTGAAAAGATTAAT
    GGATCTTAAAGCTCCAGATATAATTATTAGAAATGAAAAACGAATGTTGCAAGAGTCAGTGGATGCTTTATTCGATAATGGCAGAAGAGGCAGAGTA
    ATTACAGGAACTGGTAAACGTCCATTAAAATCTTTGGCTGAAATGCTTAAAGGAaaacaaG
    
    partial.tbl
     
    >Feature Cont3
    <1	>497	gene             
    			locus_tag KCS_111011
    <1	497	CDS             
    			note    similar to Bacillus subtilis aldolase
    			product aldolase-like protein
    			codon_start     3
    			protein_id	gnl|dbname|1084002312452
    <1	>497	mRNA             
    			product aldolase-like protein
    <499	>1516	gene             
    			locus_tag KCS_111012
    499	>1516	CDS             
    			product actin-like protein
    			protein_id	gnl|dbname|1084002312450
    <499	>1516	mRNA             
    			product actin-like protein
    
    Top


     

  • Features on the complementary strand

    Both genes are on the minus strand. The first CDS begins at nt1018 and is 3' partial. The second CDS is partial at its 5' end, at the end of the sequence at nt1516, and ends at nt1020. The first complete codon begins at nt1514, so it has codon_start=3.

    complementary.fsa
    
    >AMCont1022  [organism=Escherichia coli] [strain=xx] [tech=wgs]
    CTTGTTTTCCTTTAAGCATTTCAGCCAAAGATTTTAATGGACGTTTACCAGTTCCTGTAATTACTCTGCC
    TCTTCTGCCATTATCGAATAAAGCATCCACTGACTCTTGCAACATTCGTTTTTCATTTCTAATAATTATA
    TCTGGAGCTTTAAGATCCATTAATCTTTTCAATCTATTATTTCTATTTATAACTCTTCTATACAAATCGT
    TTAGATCTGATGTTGCAAATCTACCTCCATCTAGAGGAACAAGTGGCCTTAACTCTGGTGGTATTACAGG
    TATAGTAGTTAAAATCATCCATTCTGGTTTGTTACCAGTTTCAATAAATGAATCGATCAGTTTTAATCTT
    TTTATAGATCTTTCTTCAGCAACCTTTGATTTTGTCTCATTTATATTTTTTAATAAAATTTCTCTCTCTT
    TATTCAAGTCTATAGATTTTAAAATCTCTAGTATCGCCTCTGCTCCTATTCCTGCAGTAAAGGATTCTTC
    ACCAAACTCCTCTTGATATTTATTTAATTCATCTTCGTTTAAAAGTTGATTTTTTTTAAGACTAGTTAAT
    CCAGGCTCTATAACAATAAAACTTTCAAAGTATAGAACTCTTTCAACCTCTTTAAGCTTCATATCAATAG
    CTAGTGAAATTCTACTTGGTAAAGATTTTAAAAACCAAATATGTGCAACTGGGGTTGATAAATTGATGTG
    CCCCATTCTTTCTCTACGAACATTTGATTTAGTAACCTCTACGCCACACTTCTCACAAATAATTCCTCTG
    AACTTCATTCTTTTATATTTTCCACATAAACATTCGTAATCTTTTATTGGACCAAATATTCTCGCACAAA
    ATAGGCCGTCTTTTTCAGGTCTGAAAGTTCTATAATTAATAGTTTCGGGTTTTTTTATTTCTCCATAAGT
    CCATGACTTTATCTTTTCAGGGCTTGCTAAAGTAATTTTAATACTATTAAAATTTTGAGAGTCTGATATG
    GCAGTTTCTTTAAAAAAATCTTTAATATCTTTTTTCATTTTAATTTAACTCCACATTAAGAGCTAGCGCT
    TTGATTTCTTTTACTAAAACATTAAATGACTCAGGTATTCCTGACTCGAAATTCTCTTCACCTTTTACTA
    TTGTTTCATAAACTTTAACTCTACCAGCAACATCATCAGATTTTACTGTTAAAATTTCTTGTAAAGTATA
    AGAAGCCCCATAAGCTTCCAGAGCCCAAACTTCCATTTCTCCAAATCGTTGACCTCCCAATTGAGCCTTA
    CCACCAAGAGGTTGTTGTGTAACTAAACTATAAGGACCTGTAGATCTTGCGTGTATTTTATCTTCTACAA
    GATGGTGTAATTTTAACATATAAATAGTGCCAACTGTGACTTCTCTATCAAATTTTTCACCTGTTCTACC
    ATCCCACAATTTTGTTTGACCAGATTTTGGTAATGATGCAAGCTCTAACATATTTGTTACATCTTGCTCT
    TTTGCGCCATCAAAAACTGGTGTTGAAATTGGAATTCCACTTTGCA
    
    
    complementary.tbl
    
    >Feature AMCont1022 
    1018	>1	gene
    			locus_tag	AMt_11123
    1018	>1	CDS
    			product	hypothetical protein
    			protein_id	lcl|AMt_11123
    <1516	1020	gene
    			locus_tag	AMt_11124
    <1516	1020	CDS
    			product	oxidase
    			codon_start	3
    			protein_id	lcl|AMt_11124
    
    
    Top
     

    Revised: November 9, 2005.