WGS Example Files
Remember that the columns in a .tbl file must be tab-delimited. If the samples in which the complete sequence is included do not work, check that tabs separate the columns, not spaces.
- Simple file
- Multiple sequences in a file
- Partial coding regions
- Features on the complementary strand
Simple file
sample.fsa >Cont54 [organism=Homo sapiens] [chromosome=5] [tech=wgs] acaagcgctgctgtcgatgcaaactttagcttttaaacaagtgcaaacgcacgctgtctc acatgataacacacattatcagaatactttccatgcaatatgaaaccatagcaagctacg .... sample.tbl >Features Cont54 10400 12512 gene locus_tag CCC_03116 10400 10462 mRNA 10533 10577 10651 11098 11182 11642 11716 12512 product hypothetical protein protein_id gnl|dbname|CCC_03116 transcript_id gnl|dbname|mrna.CCC_03116 10450 10462 CDS 10533 10577 10651 11098 11182 11642 11716 12233 product hypothetical protein protein_id gnl|dbname|CCC_03116 transcript_id gnl|dbname|mrna.CCC_03116 inference profile:Genscan:2.0 15801 17688 gene locus_tag CCC_03118 15801 16607 mRNA 16750 17688 product hypothetical protein protein_id gnl|dbname|CCC_03118 transcript_id gnl|dbname|mrna.CCC_03118 15840 16607 CDS 16750 17610 product hypothetical protein protein_id gnl|dbname|CCC_03118 transcript_id gnl|dbname|mrna.CCC_03118 inference similar to RNA sequence, mRNA:INSD:AY123456.2 Here is the definition line of the flatfile view of the final record made with these files: DEFINITION Homo sapiens chromosome 5 Cont54, whole genome shotgun sequence.
Files for multiple sequences
multiple.fsa >Cont348.225 [organism=Helicobacter pylori] [strain=xxx] [tech=wgs] TTGAAGCAAGGCATTAGGCGAACCACTGCCTCTCTTTTACCTTCTTTTTTTTCCACCATTATTACTTTACTTTACATACGTTTAGGATCTGG CGAGCAGCCCAGGCGAGTGTTTTGTAGTTTTCTCGGGGCTGCCTTTTTTTCTCTCTGTGGATGTGTGTGTGGGTATGGGCTGTATTTTCCTG >Cont442.125 [organism=Helicobacter pylori] [strain=xxx] [tech=wgs] TTGAAGCAAGGCATTAGGCGAACCACTGCCTCTCTTTTACCTTCTTTTTTTTCCACCATTATTACTTTACTTTACATACGTTTAGGATCTGG CGAGCAGCCCAGGCGAGTGTTTTGTAGTTTTCTCGGGGCTGCCTTTTTTTCTCTCTGTGGATGTGTGTGTGGGTATGGGCTGTATTTTCCTG multiple.tbl >Features Cont348.225 11 109 gene locus_tag HPC_002564 gene cheA 11 109 CDS product CheA protein_id gnl|dbname|HPC_002564 >Features Cont442.125 15 113 gene locus_tag HPC_003020 15 113 CDS product TPR repeat-containing protein protein_id gnl|dbname|HPC_003020 experiment Northern blot
Partial coding region
The first coding region is partial at the 5' end and nucleotide 3 is the beginning of the first complete codon. Therefore, " < " indicates 5' partial, and codon_start "3" indicates the start of the first codon.
The second coding region is partial at the 3' end, so " > " is used to indicate 3' partial.
partial.fsa >Cont3 [organism=Mus musculus] [strain=BALB/c] [chromosome=2] [tech=wgs] TGcaaagtGGAATTCCAATTTCAACACCAGTTTTTGATGGCGCAAAAGAGCAAGATGTAACAAATATGTTAGAGCTTGCATCATTACCAAAATCTGG TCAAACAAAATTGTGGGATGGTAGAACAGGTGAAAAATTTGATAGAGAAGTCACAGTTGGCACTATTTATATGTTAAAATTACACCATCTTGTAGAA GATAAAATACACGCAAGATCTACAGGTCCTTATAGTTTAGTTACACAACAACCTCTTGGTGGTAAGGCTCAATTGGGAGGTCAACGATTTGGAGAAA TGGAAGTTTGGGCTCTGGAAGCTTATGGGGCTTCTTATACTTTACAAGAAATTTTAACAGTAAAATCTGATGATGTTGCTGGTAGAGTTAAAGTTTA TGAAACAATAGTAAAAGGTGAAGAGAATTTCGAGTCAGGAATACCTGAGTCATTTAATGTTTTAGTAAAAGAAATCAAAGCGCTAGCTCTTAATGTG GAGTTAAATTAAAATGAAAAAAGATATTAAAGATTTTTTTAAAGAAACTGCCATATCAGACTCTCAAAATTTTAATAGTATTAAAATTACTTTAGCA AGCCCTGAAAAGATAAAGTCATGGACTTATGGAGAAATAAAAAAACCCGAAACTATTAATTATAGAACTTTCAGACCTGAAAAAGACGGCCTATTTT GTGCGAGAATATTTGGTCCAATAAAAGATTACGAATGTTTATGTGGAAAATATAAAAGAATGAAGTTCAGAGGAATTATTTGTGAGAAGTGTGGCGT AGAGGTTACTAAATCAAATGTTCGTAGAGAAAGAATGGGGCACATCAATTTATCAACCCCAGTTGCACATATTTGGTTTTTAAAATCTTTACCAAGT AGAATTTCACTAGCTATTGATATGAAGCTTAAAGAGGTTGAAAGAGTTCTATACTTTGAAAGTTTTATTGTTATAGAGCCTGGATTAACTAGTCTTA AAAAAAATCAACTTTTAAACGAAGATGAATTAAATAAATATCAAGAGGAGTTTGGTGAAGAATCCTTTACTGCAGGAATAGGAGCAGAGGCGATACT AGAGATTTTAAAATCTATAGACTTGAATAAAGAGAGAGAAATTTTATTAAAAAATATAAATGAGACAAAATCAAAGGTTGCTGAAGAAAGATCTATA AAAAGATTAAAACTGATCGATTCATTTATTGAAACTGGTAACAAACCAGAATGGATGATTTTAACTACTATACCTGTAATACCACCAGAGTTAAGGC CACTTGTTCCTCTAGATGGAGGTAGATTTGCAACATCAGATCTAAACGATTTGTATAGAAGAGTTATAAATAGAAATAATAGATTGAAAAGATTAAT GGATCTTAAAGCTCCAGATATAATTATTAGAAATGAAAAACGAATGTTGCAAGAGTCAGTGGATGCTTTATTCGATAATGGCAGAAGAGGCAGAGTA ATTACAGGAACTGGTAAACGTCCATTAAAATCTTTGGCTGAAATGCTTAAAGGAaaacaaG partial.tbl >Feature Cont3 <1 >497 gene locus_tag KCS_111011 <1 497 CDS note similar to Bacillus subtilis aldolase product aldolase-like protein codon_start 3 protein_id gnl|dbname|KCS_111011 transcript_id gnl|dbname|mrna.KCS_111011 <1 >497 mRNA product aldolase-like protein protein_id gnl|dbname|KCS_111011 transcript_id gnl|dbname|mrna.KCS_111011 <499 >1516 gene locus_tag KCS_111012 499 >1516 CDS product actin-like protein protein_id gnl|dbname|KCS_111012 transcript_id gnl|dbname|mrna.KCS_111012 <499 >1516 mRNA product actin-like protein protein_id gnl|dbname|KCS_111012 transcript_id gnl|dbname|mrna.KCS_111012
Features on the complementary strand
Both genes are on the minus strand. The first CDS begins at nt1018 and is 3' partial. The second CDS is partial at its 5' end, at the end of the sequence at nt1516, and ends at nt1020. The first complete codon begins at nt1514, so it has codon_start=3.
complementary.fsa >AMCont1022 [organism=Escherichia coli] [strain=xx] [tech=wgs] CTTGTTTTCCTTTAAGCATTTCAGCCAAAGATTTTAATGGACGTTTACCAGTTCCTGTAATTACTCTGCC TCTTCTGCCATTATCGAATAAAGCATCCACTGACTCTTGCAACATTCGTTTTTCATTTCTAATAATTATA TCTGGAGCTTTAAGATCCATTAATCTTTTCAATCTATTATTTCTATTTATAACTCTTCTATACAAATCGT TTAGATCTGATGTTGCAAATCTACCTCCATCTAGAGGAACAAGTGGCCTTAACTCTGGTGGTATTACAGG TATAGTAGTTAAAATCATCCATTCTGGTTTGTTACCAGTTTCAATAAATGAATCGATCAGTTTTAATCTT TTTATAGATCTTTCTTCAGCAACCTTTGATTTTGTCTCATTTATATTTTTTAATAAAATTTCTCTCTCTT TATTCAAGTCTATAGATTTTAAAATCTCTAGTATCGCCTCTGCTCCTATTCCTGCAGTAAAGGATTCTTC ACCAAACTCCTCTTGATATTTATTTAATTCATCTTCGTTTAAAAGTTGATTTTTTTTAAGACTAGTTAAT CCAGGCTCTATAACAATAAAACTTTCAAAGTATAGAACTCTTTCAACCTCTTTAAGCTTCATATCAATAG CTAGTGAAATTCTACTTGGTAAAGATTTTAAAAACCAAATATGTGCAACTGGGGTTGATAAATTGATGTG CCCCATTCTTTCTCTACGAACATTTGATTTAGTAACCTCTACGCCACACTTCTCACAAATAATTCCTCTG AACTTCATTCTTTTATATTTTCCACATAAACATTCGTAATCTTTTATTGGACCAAATATTCTCGCACAAA ATAGGCCGTCTTTTTCAGGTCTGAAAGTTCTATAATTAATAGTTTCGGGTTTTTTTATTTCTCCATAAGT CCATGACTTTATCTTTTCAGGGCTTGCTAAAGTAATTTTAATACTATTAAAATTTTGAGAGTCTGATATG GCAGTTTCTTTAAAAAAATCTTTAATATCTTTTTTCATTTTAATTTAACTCCACATTAAGAGCTAGCGCT TTGATTTCTTTTACTAAAACATTAAATGACTCAGGTATTCCTGACTCGAAATTCTCTTCACCTTTTACTA TTGTTTCATAAACTTTAACTCTACCAGCAACATCATCAGATTTTACTGTTAAAATTTCTTGTAAAGTATA AGAAGCCCCATAAGCTTCCAGAGCCCAAACTTCCATTTCTCCAAATCGTTGACCTCCCAATTGAGCCTTA CCACCAAGAGGTTGTTGTGTAACTAAACTATAAGGACCTGTAGATCTTGCGTGTATTTTATCTTCTACAA GATGGTGTAATTTTAACATATAAATAGTGCCAACTGTGACTTCTCTATCAAATTTTTCACCTGTTCTACC ATCCCACAATTTTGTTTGACCAGATTTTGGTAATGATGCAAGCTCTAACATATTTGTTACATCTTGCTCT TTTGCGCCATCAAAAACTGGTGTTGAAATTGGAATTCCACTTTGCA complementary.tbl >Feature AMCont1022 1018 >1 gene locus_tag AMt_11123 1018 >1 CDS product hypothetical protein protein_id gnl|dbname|AMt_11123 <1516 1020 gene locus_tag AMt_11124 <1516 1020 CDS product oxidase codon_start 3 protein_id gnl|dbname|AMt_11124
WGS Resources
- About WGS
- WGS Project List
- WGS Submission Guide
- FAQ
- tbl2asn
- Create Submission Template
- BioProject
- Eukaryotic Annotation Guide
- Prokaryotic Annotation Guide
- Discrepancy Report
- WGS Example Files
- Assembly Submission Guide
- AGP Format
- WGS Submission Portal
- NCBI Prokaryotic Genome Annotation Pipeline (PGAAP)
- Metagenome Submission Guide
- Structured Comment