Submission of Annotation Using a Table
NCBI Logo Submission of Annotation Using a Table


Sequin can read a five-column, tab-delimited table of feature locations and qualifiers.

The feature table format allows different kinds of features (e.g., gene, mRNA, coding region, tRNA) and qualifiers (e.g., /product, /note) to be annotated. The valid features and qualifiers are restricted to those approved by the International Nucleotide Sequence Database Collaboration. Once the annotations have been imported, they should be validated in Sequin.

The entire process can be automated with the utility tbl2asn, which produces ASN.1 from pairs of table and sequence files.

Table Layout

Sequin reads features from a five-column, tab-delimited table. The feature table specifies the location and type of each feature, and Sequin processes the feature intervals and translates any CDS features into proteins. The first line of the table contains the following basic information.

>Feature SeqId table_name

The sequence identifier (SeqId) must be the same as that used on the sequence. The table_name is optional. Subsequent lines of the table list the features. Each feature is on a separate line. Qualifiers describing that feature are on the line below. Columns are separated by tabs.
Column 1: Start location of feature
Column 2: Stop location of feature
Column 3: Feature key
Column 4: Qualifier key
Column 5: Qualifier value

Figure 1 shows a sample table and illustrates a number of points about the table format. The GenBank flatfile corresponding to this table is shown in Figure 2.

Genome Submissions

When submitting a complete bacterial genome, please review the genome guidelines. If submitting whole genome shotgun sequences, review the submission protocols. Note that when annotating complete genomes, systematic gene names and protein identifiers are required.

Single Record Submissions

Single submissions can be treated like a new Sequin submission. On the starting Sequin page, click on "Start New Submission", and fill out the Submitting Authors form. On the Organism and Sequences form, indicate that this is a Single sequence in FASTA format, choose the Organism and Molecule, indicate that the FASTA definition line starts with a sequence ID, and import the nucleotide sequence. Do not import protein sequence. After you create the initial record containing the sequence, import the table using Sequin's File-->Open command. The features listed in the table will be immediately displayed on the flatfile view. Carry out any desired editing, then choose Search-->Validate to check for errors in the record. Use the File-->Save As command to save the record in ASN.1 format as a .sqn file. The .sqn file should be mailed to GenBank or submitted using SequinMacroSend.

Multiple Record Submissions

tbl2asn is a command line program that automates parts of the submission process and is available via ftp. tbl2asn reads a template, along with the sequence and table files, and outputs ASN.1 for submission to GenBank. Thus, the submitter does not need to read each set of table and sequence files into Sequin.

Figure 1 : Sequin table format:

>Feature Sc_16
			PubMed		8849441
<1	1050	gene
			gene		ATH1
<1	1009	CDS
			product		acid trehalase
			product		Ath1p
			codon_start	2
<1	1050	mRNA
			product		acid trehalase
1253	420	gene
			gene	YPR027C
1253	420	CDS
			product		Ypr027cp
			note		hypothetical protein
1253	420	mRNA
			product		Ypr027cp
2626	2535	gene
			gene	trnF
2626	2590	tRNA
2570	2535
			product		tRNA-Phe
2626	2590	exon
			number 1
2570	2535	exon
			number 2
3450	4536	gene
			gene		YIP2
3522	3572	CDS
3706	4197
			product		Yip2p
                        prot_desc       similar to human polyposis locus protein 1 (YPD)
3450	3572	mRNA
3706	4536
			product		Yip2p

Figure 2 : GenBank flatfile

LOCUS       Sc_16        7000 bp    DNA             PLN       08-MAY-2000
DEFINITION  Saccharomyces cerevisiae strain S288C chromosome XVI, partial sequence.
SOURCE      baker's yeast.
  ORGANISM  Saccharomyces cerevisiae
            Eukaryota; Fungi; Ascomycota; Hemiascomycetes; Saccharomycetales;
            Saccharomycetaceae; Saccharomyces.
REFERENCE   1  (bases 1 to 7000)
  AUTHORS   Goffeau,A., Barrell,B.G., Bussey,H., Davis,R.W., Dujon,B.,
            Feldmann,H., Galibert,F., Hoheisel,J.D., Jacq,C., Johnston,M.,
            Louis,E.J., Mewes,H.W., Murakami,Y., Philippsen,P., Tettelin,H. and
  TITLE     Life with 6000 genes
  JOURNAL   Science 274 (5287), 546 (1996)
   PUBMED   8849441
REFERENCE   2  (bases 1 to 7000)
  AUTHORS   Ouellette,B.F.F.
  TITLE     Direct Submission
  JOURNAL   Submitted (08-MAY-2000) NCBI/NLM, National Institutes of Health,
            Building 38A, Room 8N805, Bethesda, MD 20894, USA
FEATURES             Location/Qualifiers
     source          1..7000
                     /organism="Saccharomyces cerevisiae"
     mRNA            <1..1050
                     /product="acid trehalase"
     gene            <1..1050
     CDS             <1..1009
                     /product="acid trehalase"
     mRNA            complement(2420..3253)
     gene            complement(2420..3253)
     CDS             complement(2420..3253)
                     /note="hypothetical protein"
     gene            complement(4535..4626)
     tRNA            complement(join(4535..4570,4590..4626))
     exon            complement(4535..4570)
     exon            complement(4590..4626)
     mRNA            join(5450..5572,5706..6536)
     gene            5450..6536
     CDS             join(5522..5572,5706..6197)
                     /note="similar to human polyposis locus protein 1 (YPD)"
BASE COUNT     2201 a   1276 c   1255 g   2268 t
        1 cgaccacaat ggtacgattg ttcataaatc aggagatgtt cctattcata taaagatacc
       61 aaacagatct ctaatacatg accaggatat caacttctat aatggttccg aaaacgaaag
      121 aaaaccaaat ctagagcgta gagacgtcga ccgtgttggt gatccaatga ggatggatag [etc.]

Questions or Comments?
Write to the NCBI Service Desk

Revised December 14, 2009.