What is a Transcriptome Shotgun
Assembly (TSA) Sequence? |
TSA is an archive of computationally assembled sequences from primary data submitted to
dbEST, the Short Read Archive (SRA), or the Trace Archive. The overlapping sequence reads from a complete transcriptome are assembled into transcripts
by computational methods instead of by traditional cloning and sequencing of cloned cDNAs. The primary
sequence data used in the assemblies and the assemblies must be submitted by the same submitter.
TSA sequence records differ from EST and GenBank records because there are no physical counterparts
to the assemblies asserted in the TSA record.
What is a TSA primary
sequence? |
The primary sequences used to assemble a TSA
sequence have been experimentally determined by the
submitter of the assemblies and are now publicly available in the Trace
Archive, the Short Read Archive, or dbEST databases. They may not be from
a proprietary database. In addition, the ESTs used to create the assemblies must be publicly available. A hold until release date can be requested for the TSA sequences.
How Do TSA Sequence Records Differ
from Other GenBank/EMBL/DDBJ Records? |
The display of a TSA sequence is similar to other
International Nucleotide Sequence Database Collaboration (INSDC) records,
but includes the following:
- Keywords:
TSA;
Transcriptome Shotgun Assembly
- The label 'TSA:' at the beginning of each
Definition Line.
- Link to the Transcriptome Shotgun Assembly
project.
- PRIMARY field providing the base pair spans of
the primary sequences that contribute to the TSA assembly if assembled
from ESTs.
- Alternatively, if the assembly was submitted to
the assembly archive there will be a link to the assembly archive file
under DBLINK.
Other Features and References are similar to those
displayed in regular GenBank/EMBL/DDBJ records.
An example of a TSA sequence assembled from dbEST
records is EZ000001.
An example of a TSA assembled from SRA sequences
is EZ007475.
TSA sequence records are shared by all three INSDC databases
and can be found using typical search methods in Entrez Nucleotide and Entrez Protein
(ie, submitter name, gene/protein name, Accession Number, etc)
How to Submit TSA Sequence
Data |
For all TSA submissions, you must register your project in the Projects database as a Transcriptome Shotgun
Assembly project. (At this time the SRA or GenBank staff will need
to do this for you.)
The submission process is based on the type
of primary sequence data used to assemble the TSA sequences. If you are using
Next Generation sequencing technology then you should be submit the TSA primary sequence data to SRA and not to dbEST.
[A] Assemblies generated from EST sequences.
[B] Assemblies generated from SRA or trace data.
To submit TSA sequences generated from SRA or
trace data you need to submit the relevant SRA or trace data and
assembly data used to generate the TSA sequences. You can submit this
information in three separate steps or as one bulk submission.
- Submitting the SRA or trace files, assembly,
and TSA files separately.
- Submit the primary sequence data to either
the Trace Archive database or the SRA database.
- Submit the assemblies of these sequences (for
example the .ace files) to the assembly archive.
- The TSA submission file can be generated
using Sequin or tbl2asn. Please see creating the
submission file for more information about using Sequin and
tbl2asn.
- The sequence identifier
(localID) in the asn.1 file should be the
same as the corresponding assembly archive identifier (ai).
- Submitting the SRA or trace files, assembly, and TSA files as a bulk
submission.
- Bulk submissions must be submitted using an
FTP account. First, you will need to contact gb-admin@ncbi.nlm.nih.gov
for account information.
- Create the template (.sbt) file with
Sequin. See creating the submission file for more
information about creating the .sbt file.
- Deposit the SRA or trace files, assembly
(.ace files) and TSA template file (.sbt) in the FTP account as one
g-zipped tar file ('tar ball').
- The TSA asn.1 files will be generated from
the .ace files and .sbt file.
Creating the submission
file |
General Information
- The library information for the individual
reads should be annotated on the source feature.
- The entire submitted sequence must be assembled from
primary sequence data.
- There is no limit on the number of
overlapping/adjoining primary sequences that can be cited for a TSA
submission.
- TSA sequences must cite the same organism as
the primary sequence data.
Sequences can created for the TSA database
through either Sequin or tbl2asn:
Sequin
- Follow standard procedure for Sequin submission.
- Include a note in
the email containing your Sequin file that the submission is intended for TSA.
tbl2asn
- tbl2asn reads a template along with the
sequence and table files, and outputs ASN.1 for submission to GenBank.
tbl2asn requires that the sequence and annotation file have specific
name conventions. The FASTA-formatted sequence file has ".fsa" as an
extension, and the five column tab-delimited table file has ".tbl" as an
extension. The base name of the .tbl file must be identical to that of
the .fsa file for tbl2asn to recognize it and to include the annotation
in the output ".sqn" file that it generates.
- For TSA submissions, you must include the
technique [tech=TSA] in the ".fsa" file.
Creating the template file (.sbt)
- Choose start a new submission with Sequin.
- Enter manuscript title if desired.
- Enter contact, authors and affiliation
information.
- Return to submission tab and use
File->Export Submitter Info.
- Save as template.sbt.
What should not be submitted to
TSA |
- Assemblies from sequences not directly
sequenced by the submitter of the assemblies.
- Clonal based assemblies. These should be submitted to GenBank.
Disclaimer Privacy statement
Revised April 2, 2009 |