| |
Submission of Annotation Using a Table |
| Sequin | Entrez | BLAST | OMIM | Taxonomy | Structure |
Sequin can read a five-column, tab-delimited table of feature locations and qualifiers.
The feature table format allows different kinds of features (e.g., gene, mRNA, coding region, tRNA) and qualifiers (e.g., /product, /note) to be annotated. The valid features and qualifiers are restricted to those approved by the International Nucleotide Sequence Database Collaboration. Once the annotations have been imported, they should be validated in Sequin.
The entire process can be automated with the utility tbl2asn, which produces ASN.1 from pairs of table and sequence files.
Sequin reads features from a five-column, tab-delimited table. The feature table specifies the location and type of each feature, and Sequin processes the feature intervals and translates any CDS features into proteins. The first line of the table contains the following basic information.
>Feature SeqId table_name
The sequence identifier (SeqId) must be the same as that used on the sequence. The table_name is optional. Subsequent lines of the table list the features. Each feature is on a separate line. Qualifiers describing that feature are on the line below. Columns are separated by tabs.
Column 1: Start location of feature
Column 2: Stop location of feature
Column 3: Feature key
Line2:
Column 4: Qualifier key
Column 5: Qualifier value
Figure 1 shows a sample table and illustrates a number of points about the table format. The GenBank flatfile corresponding to this table is shown in Figure 2.
When submitting a complete bacterial genome, please review the genome guidelines. If submitting whole genome shotgun sequences, review the submission protocols. Note that when annotating complete genomes, systematic gene names and protein identifiers are required.
Single submissions can be treated like a new Sequin submission. On the starting Sequin page, click on "Start New Submission", and fill out the Submitting Authors form. On the Organism and Sequences form, indicate that this is a Single sequence in FASTA format, choose the Organism and Molecule, indicate that the FASTA definition line starts with a sequence ID, and import the nucleotide sequence. Do not import protein sequence. After you create the initial record containing the sequence, import the table using Sequin's File-->Open command. The features listed in the table will be immediately displayed on the flatfile view. Carry out any desired editing, then choose Search-->Validate to check for errors in the record. Use the File-->Save As command to save the record in ASN.1 format as a .sqn file. The .sqn file should be mailed to GenBank or submitted using SequinMacroSend.
tbl2asn is a command line program that automates parts of the submission process and is available via ftp. tbl2asn reads a template, along with the sequence and table files, and outputs ASN.1 for submission to GenBank. Thus, the submitter does not need to read each set of table and sequence files into Sequin.
>Feature Sc_16
1 7000 REFERENCE
PubMed 8849441
<1 1050 gene
gene ATH1
<1 1009 CDS
product acid trehalase
product Ath1p
codon_start 2
<1 1050 mRNA
product acid trehalase
[offset=2000]
1253 420 gene
gene YPR027C
1253 420 CDS
product Ypr027cp
note hypothetical protein
1253 420 mRNA
product Ypr027cp
2626 2535 gene
gene trnF
2626 2590 tRNA
2570 2535
product tRNA-Phe
2626 2590 exon
number 1
2570 2535 exon
number 2
3450 4536 gene
gene YIP2
3522 3572 CDS
3706 4197
product Yip2p
prot_desc similar to human polyposis locus protein 1 (YPD)
3450 3572 mRNA
3706 4536
product Yip2p
LOCUS Sc_16 7000 bp DNA PLN 08-MAY-2000 DEFINITION Saccharomyces cerevisiae strain S288C chromosome XVI, partial sequence. ACCESSION Sc_16 VERSION KEYWORDS . SOURCE baker's yeast. ORGANISM Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Hemiascomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces. REFERENCE 1 (bases 1 to 7000) AUTHORS Goffeau,A., Barrell,B.G., Bussey,H., Davis,R.W., Dujon,B., Feldmann,H., Galibert,F., Hoheisel,J.D., Jacq,C., Johnston,M., Louis,E.J., Mewes,H.W., Murakami,Y., Philippsen,P., Tettelin,H. and Oliver,S.G. TITLE Life with 6000 genes JOURNAL Science 274 (5287), 546 (1996) PUBMED 8849441 REFERENCE 2 (bases 1 to 7000) AUTHORS Ouellette,B.F.F. TITLE Direct Submission JOURNAL Submitted (08-MAY-2000) NCBI/NLM, National Institutes of Health, Building 38A, Room 8N805, Bethesda, MD 20894, USA FEATURES Location/Qualifiers source 1..7000 /organism="Saccharomyces cerevisiae" /strain="S288C" /chromosome="XVI" mRNA <1..1050 /gene="ATH1" /product="acid trehalase" gene <1..1050 /gene="ATH1" CDS <1..1009 /gene="ATH1" /note="Ath1p" /codon_start=2 /product="acid trehalase" /translation="DHNGTIVHKSGDVPIHIKIPNRSLIHDQDINFYNGSENERKPNL ERRDVDRVGDPMRMDRYGTYYLLKPKQELTVQLFKPGLNARNNIAENKQITNLTAGVP GDVAFSALDGNNYTHWQPLDKIHRAKLLIDLGEYNEKEITKGMILWGQRPAKNISISI LPHSEKVENLFANVTEIMQNSGNDQLLNETIGQLLDNAGIPVENVIDFDGIEQEDDES LDDVQALLHWKKEDLAKLIEQIPRLNFLKRKFVKILDNVPVSPSEPYYEASRNQSLIE ILPSNRTTFTIDYDKLQVGDKGNTDWRKTRYIVVAVQGVYDDYDDDNKGATIKEIVLN D" mRNA complement(2420..3253) /gene="YPR027C" /product="Ypr027cp" gene complement(2420..3253) /gene="YPR027C" CDS complement(2420..3253) /gene="YPR027C" /note="hypothetical protein" /codon_start=1 /product="Ypr027cp" /translation="MVGIYRILASFVPLLGLLFAFHDDDMIDTVTIIKTVYETVTSTS TAPAPAATKSVSEKKLDDTKLTLQVIQTMVSCFSVGENPANMISCGLGVVILMFSLII ELINKLENDGINEPQRLYDLIKPKYVELPSNYVNEKIKTTFEPLDLYLGVNMNTSGSE LNQNCLILKLGEKTALPFPGLAQQICYTKGASNEFTNYKLSDIQGNLNENSQGIANGV FQKISNIRKISGNFKSQLYQISEKITDENWDGSAVGFTAHGREKGPNKSQISVSFYRD N" gene complement(4535..4626) /gene="trnF" tRNA complement(join(4535..4570,4590..4626)) /product="tRNA-Phe" /gene="trnF" exon complement(4535..4570) /number=1 exon complement(4590..4626) /number=2 mRNA join(5450..5572,5706..6536) /gene="YIP2" /product="Yip2p" gene 5450..6536 /gene="YIP2" CDS join(5522..5572,5706..6197) /gene="YIP2" /note="similar to human polyposis locus protein 1 (YPD)" /codon_start=1 /product="Yip2p" /translation="MSEYASSIHSQMKQFDTKYSGNRILQQLENKTNLPKSYLVAGLG FAYLLLIFINVGGVGEILSNFAGFVLPAYLSLVALKTPTSTDDTQLLTYWIVFSFLSV IEFWSKAILYLIPFYWFLKTVFLIYIALPQTGGARMIYQKIVAPLTDRYILRDVSKTE KDEIRASVNEASKATGASVH" BASE COUNT 2201 a 1276 c 1255 g 2268 t ORIGIN 1 cgaccacaat ggtacgattg ttcataaatc aggagatgtt cctattcata taaagatacc 61 aaacagatct ctaatacatg accaggatat caacttctat aatggttccg aaaacgaaag 121 aaaaccaaat ctagagcgta gagacgtcga ccgtgttggt gatccaatga ggatggatag [etc.]
Questions or Comments?
Write to the NCBI Service Desk
Revised December 14, 2009.