NCBI 
logo Transcriptome Shotgun Assembly Sequence Database
PubMed Entrez BLAST OMIM Taxonomy Structure
NCBI
FAQ
Frequently asked questions about TSA

GenBank
Sequence submission support and software

dbEST
database of Expressed Sequence Tags

Short Read Archive
SRA database

Trace Archive
Trace Archive database

Assembly Archive
Assembly archive database

Sequin
Stand-alone sequence submission tool

tbl2asn
Command line sequence submission tool


blue bulletWhat is a Transcriptome Shotgun Assembly (TSA) Sequence?

TSA is an archive of computationally assembled sequences from primary data submitted to dbEST, the Short Read Archive (SRA), or the Trace Archive. The overlapping sequence reads from a complete transcriptome are assembled into transcripts by computational methods instead of by traditional cloning and sequencing of cloned cDNAs. The primary sequence data used in the assemblies and the assemblies must be submitted by the same submitter. TSA sequence records differ from EST and GenBank records because there are no physical counterparts to the assemblies asserted in the TSA record.

blue bulletWhat is a TSA primary sequence?

The primary sequences used to assemble a TSA sequence have been experimentally determined by the submitter of the assemblies and are now publicly available in the Trace Archive, the Short Read Archive, or dbEST databases. They may not be from a proprietary database. In addition, the ESTs used to create the assemblies must be publicly available. A hold until release date can be requested for the TSA sequences.

blue bulletHow Do TSA Sequence Records Differ from Other GenBank/EMBL/DDBJ Records?

The display of a TSA sequence is similar to other International Nucleotide Sequence Database Collaboration (INSDC) records, but includes the following:

  • Keywords:
    TSA; Transcriptome Shotgun Assembly
  • The label 'TSA:' at the beginning of each Definition Line.
  • Link to the Transcriptome Shotgun Assembly project.
  • PRIMARY field providing the base pair spans of the primary sequences that contribute to the TSA assembly if assembled from ESTs.
  • Alternatively, if the assembly was submitted to the assembly archive there will be a link to the assembly archive file under DBLINK.

Other Features and References are similar to those displayed in regular GenBank/EMBL/DDBJ records.

An example of a TSA sequence assembled from dbEST records is EZ000001.

An example of a TSA assembled from SRA sequences is EZ007475.

TSA sequence records are shared by all three INSDC databases and can be found using typical search methods in Entrez Nucleotide and Entrez Protein (ie, submitter name, gene/protein name, Accession Number, etc)

blue bulletHow to Submit TSA Sequence Data

For all TSA submissions, you must register your project  in the  Projects database as a Transcriptome Shotgun Assembly project.  (At this time the SRA or GenBank staff will need to do this for you.)

The submission process is based on the type of primary sequence data used to assemble the TSA sequences. If you are using Next Generation sequencing technology then you should be submit the TSA primary sequence data to SRA and not to dbEST

[A] Assemblies generated from EST sequences. 

  • Submit EST sequences to dbEST if not already done.
  • Create a tab-delimited table of all the ESTs used to make each assembly. For example, the assembly table for a TSA record with the localID ABC1234 would look like this:
    localID:	ESTs:	
    ABCD1234	BU795116, BU719253, CV686614
    
  • Confirm that all EST sequences used to create the assembly are available in the public database.
  • The submission file can be generated using Sequin or tbl2asn. Please see creating the submission file for more information about using Sequin and tbl2asn.

[B] Assemblies generated from SRA or trace data.

To submit TSA sequences generated from SRA or trace data you need to submit the relevant SRA or trace data and assembly data used to generate the TSA sequences. You can submit this information in three separate steps or as one bulk submission.

  • Submitting the SRA or trace files, assembly, and TSA files separately.

    • Submit the primary sequence data to either the Trace Archive database or the SRA database.
    • Submit the assemblies of these sequences (for example the .ace files) to the assembly archive.
    • The TSA submission file can be generated using Sequin or tbl2asn. Please see creating the submission file for more information about using Sequin and tbl2asn.
    • The sequence identifier (localID) in the asn.1 file should be the same as the corresponding assembly archive identifier (ai).

  • Submitting the SRA or trace files, assembly, and TSA files as a bulk submission.

    • Bulk submissions must be submitted using an FTP account. First, you will need to contact gb-admin@ncbi.nlm.nih.gov for account information.
    • Create the template (.sbt) file with Sequin.  See creating the submission file for more information about creating the .sbt file.
    • Deposit the SRA or trace files, assembly (.ace files) and TSA template file (.sbt) in the FTP account as one g-zipped tar file ('tar ball').
    • The TSA asn.1 files will be generated from the .ace files and .sbt file. 
blue bulletCreating the submission file

General Information

  • The library information for the individual reads should be annotated on the source feature.
  • The entire submitted sequence must be assembled from primary sequence data.
  • There is no limit on the number of overlapping/adjoining primary sequences that can be cited for a TSA submission.
  • TSA sequences must cite the same organism as the primary sequence data.

Sequences can created for the TSA database through either Sequin or tbl2asn:

Sequin
  • Follow standard procedure for Sequin submission.
  • Include a note in the email containing your Sequin file that the submission is intended for TSA.

tbl2asn
  • tbl2asn reads a template along with the sequence and table files, and outputs ASN.1 for submission to GenBank. tbl2asn requires that the sequence and annotation file have specific name conventions. The FASTA-formatted sequence file has ".fsa" as an extension, and the five column tab-delimited table file has ".tbl" as an extension. The base name of the .tbl file must be identical to that of the .fsa file for tbl2asn to recognize it and to include the annotation in the output ".sqn" file that it generates.
  • For TSA submissions, you must include the technique [tech=TSA] in the ".fsa" file.

Creating the template file (.sbt)
  • Choose start a new submission with Sequin.
  • Enter manuscript title if desired.
  • Enter contact, authors and affiliation information.
  • Return to submission tab and use File->Export Submitter Info.
  • Save as template.sbt.

blue bulletWhat should not be submitted to TSA
  • Assemblies from sequences not directly sequenced by the submitter of the assemblies.
  • Clonal based assemblies. These should be submitted to GenBank.

Disclaimer     Privacy statement

Revised April 2, 2009