FAQ
Frequently asked questions about TPA
GenBank
Sequence submission support and software
BankIt
For quick and simple submissions
Sequin
Stand-alone sequence submission tool
TPA:EXP
Information about the TPA:experimental database
TPA:INF
Information about the TPA:inferential database
|
|
What is a Third Party
Annotation (TPA) Sequence? |
TPA: A database designed to capture experimental or inferential results that support submitter-provided annotation for sequence data that the
submitter did not directly determine but derived from GenBank primary data. TPA records are divided into two categories:
TPA:experimental: Annotation of sequence data is supported by peer-reviewed wet-lab experimental evidence.
TPA:inferential: Annotation of sequence data by inference (where the source molecule or its product(s) have not been the subject of direct experimentation)
TPA database records differ from GenBank and RefSeq records:
GenBank: An archival database of primary nucleotide sequences that were directly sequenced by the submitter.
RefSeq: A curated, non-redundant database that includes genomic DNA, transcript (RNA), and protein products, for major organisms.
The sequence data are derived from GenBank primary data, and the annotation is computational, from published literature,
or from domain experts.
A TPA sequence is derived or assembled from primary sequence data currently
found in the DDBJ/EMBL/GenBank International Nucleotide Sequence
Database. It can be genomic or mRNA sequence and can be assembled
or derived from primary genomic and/or mRNA sequences. TPA sequences are submitted
to DDBJ/EMBL/GenBank as part of the process of publishing biological experiments that
include the annotation of existing, primary nucleotide sequences.
Examples of TPA sequences are:
- mRNA assembled from overlapping EST sequences.
- mRNA derived from an unannotated section of genomic sequence by
comparison with another known mRNA from a different organism.
- mRNA assembled from overlapping EST sequences, other partial mRNAs,
and/or genomic sequences.
- previously unannotated genomic sequence now described with the
exons, introns, and coding region information (CDS) of a new gene.
Note: It is required that all new annotations will be experimentally determined to exist, directly or indirectly.
What is a primary
sequence? |
'Primary' sequences used to assemble a TPA sequence are those that have been
experimentally determined and are now publicly available in the GenBank/EMBL/DDBJ
databases. These may also be trace data sequences and Whole Genome Shotgun (WGS)
sequences. They may not be from a proprietary database. Each primary sequence
used to assemble a TPA sequence must be identified by an Accession Number in
the submission of the TPA sequence.
Reference sequences may not be cited as data used to build TPA sequences
since RefSeqs are not primary data. For example, sequences with Accession
Numbers such as NT_112233 or NW_123456 represent contig sequences; the sequences
used to assemble these contigs, which can be found at the bottom of contig
records, should be cited in a TPA sequence submission. Sequences with
Accession Numbers such as XM_345678 or NM_123456 are RefSeqs representing mRNAs
that are not experimentally determined and therefore cannot be cited as
primary data.
How Do TPA Sequence
Records Differ from Other GenBank/EMBL/DDBJ Records? |
The display of a TPA sequence is similar to other Collaboration records,
but includes the following:
- Keywords:
TPA;THIRD PARTY ANNOTATION; TPA:experimental TPA;THIRD PARTY ANNOTATION; TPA:inferential
- The label 'TPA_exp: or TPA_inf:' at the beginning of each Definition Line.
- PRIMARY field providing the base pair spans of the primary sequences
that contribute to the TPA sequence.
Other Features and References are similar to those displayed in other
GenBank/EMBL/DDBJ records.
An example of a TPA:experimental is
BK000016
An example of a TPA:inferential is
BK000554
TPA sequence records are shared by all three Collaboration databases and
can be found using typical search methods in EntrezNuc and EntrezProt
(ie, submitter name, gene/protein name, Accession Number, etc)
How
to Submit TPA Sequence Data |
Sequence can be submitted to the TPA database through either BankIt or Sequin:
BankIt
- Check 'No' to answer the question 'Is This Primary Sequence
Data?'.
- Input list of Accession Numbers of all the primary sequences
used to assemble or derive the submitted sequence.
- Provide explanation of all experimental evidence or other supporting evidence.
- Complete standard submission process, being sure to annotate all new descriptive
information (CDS, protein name, gene name, etc) for the TPA sequence.
- Sequence submission will be labeled as a TPA sequence and will be processed
accordingly.
Sequin
- Follow standard procedure for Sequin submission.
- Choose Third Party Annotation from the Sequence Format window under Submission category
- The Assembly Tracking box will appear with the flatfile display. The primary Accession Number(s) used to assemble/derive the TPA sequence should be entered into this
box.
- Click on Accept; new COMMENT field will appear in the flatfile, which will
list the primary sequence Accession Numbers.
- It is recommended that the submitter note in the email
that contains the sequence submission that this is intended for TPA.
General Information
- The entire submitted sequence must be covered by primary sequence data.
- There is no limit on the number of overlapping/adjoining primary sequences that
can be cited for a TPA submission.
- If sections of a sequence submitted to TPA have been newly determined by the
submitter, those sequences (if they are more than 50 nt) must first be submitted
to GenBank, processed, and released to the public before they can be cited as
primary sequences
- TPA sequences must cite the same organism as the primary sequence data.
When are TPA
sequences
released? |
- TPA sequences are held confidential until their Accession Numbers or sequence
data and/or annotation appear in a peer-reviewed publication in a biological
journal.
- No sequence accepted for the TPA database will be released to
the public until the submitter notifies us of its publication or we determine
independently that such information was published.
What should not be
submitted to TPA |
- Synthetic constructs such as cloning vectors that use well characterized,
publicly available genes, promoters, or terminators; these should be
submitted as synthetic sequences for GenBank.
- Microsatellites and related types of repeat regions
- New sequence that updates or changes existing sequence data from another submitter; these
should be submitted as new sequences for GenBank.
- Annotation that has arisen from an automated tool, such as
GeneMark, tRNA scan or ORF finder, where no further evidence,
experimental or otherwise, is presented for the annotation.
- Annotation from in vivo, in vitro, or in silico experimentation that will not be submitted for publication in a peer-reviewed journal.
Disclaimer
Privacy statement
Revised January 13, 2006
|