What is a Third Party Annotation:inferential (TPA:inferential) Sequence?

TPA:inferential: A database of sequences annotated by inference, where the source molecule or its product(s) have not been the subject of direct experimentation.

A TPA sequence is derived or assembled from primary sequence data currently found in the DDBJ/EMBL/GenBank databases. It can be genomic or mRNA sequence, and can be assembled or derived from primary genomic and/or mRNA sequences. These sequences are submitted to DDBJ/EMBL/GenBank as part of the process of publishing biological experiments that include the annotation of existing nucleotide sequences in the primary sequence database.

Examples of TPA:inferential

  • CDS and related annotation applied to a sequence derived from existing genomic and/or mRNA primary data with reported wet-lab experimental evidence for a homologous molecule but no direct wet-lab experimental evidence. The reported experimental evidence must have been generated by the submission group and must be published in a peer-reviewed journal.
  • CDS and related annotation applied to a sequence derived from existing genomic and/or mRNA primary data in addition to novel sequencing with no wet-lab experimental evidence. If the novel sequence was only used to bridge two pieces of sequence, there must be reported wet-lab experimental evidence for a homologous molecule.
  • Sequence and annotation covered in a review paper or discussion section, where wet-lab experimental evidence is reported, but not generated by the TPA submitter. The experimental evidence should be reported directly in the review paper or be from a paper by the author of the review paper.
  • Annotation of non-coding genes and transcripts with no wet-lab experimental evidence for their existence and/or function but are submitted as part of a study. One or more of the study's sequences should be supported by experimental evidence and be in TPA:experimental or DDBJ/EMBL/GenBank. For an example of this type of study see PubMed 14681587. The annotations cannot be generated by an annotation program such as tRNAscan.
  • Annotation of pseudogenes with no wet-lab experimental evidence, when submitted as part of a study that includes sequences of functional homologs of the pseudogene. One or more of the study's sequences should be supported by experimental evidence and be in TPA:experimental or DDBJ/EMBL/GenBank.
  • Annotation of pseudogenes that are not part of a gene study but there is experimental evidence. An example of experimental work done to support the description of a pseudogene can be found in PubMed 15908099.
  • A sequence submitted as part of a collection of annotated members of a gene family, where wet-lab experimental evidence does not exist for the annotation. One or more members of the set should be supported by experimental evidence and be in TPA:experimental or DDBJ/EMBL/GenBank.
  • A sequence representing an assembled genome or naturally occurring plasmid that includes features with assigned gene symbols or product identifiers, where the annotated features may be a mix of experimentally and inferentially determined data.

How Do TPA:inferential Sequence Records Differ from TPA:experimental and Other GenBank/EMBL/DDBJ Records?

The display of a TPA record is similar to other Collaboration records, but includes the following:

  • Keywords: TPA; Third Party Annotation; TPA:inferential.
  • The label 'TPA_inf:' at the beginning of each Definition Line.
  • PRIMARY field providing the base pair spans of the primary sequences that contribute to the TPA sequence.

Other Features and References are similar to those displayed in other GenBank/EMBL/DDBJ records.

An example of a TPA:inferential submission is BK000554

TPA sequence records are shared by all three Collaboration databases and can be found using typical search methods in EntrezNuc and EntrezProt (ie, submitter name, gene/protein name, Accession Number, etc)

How to Submit TPA Sequence Data

Sequence can be submitted to the TPA database through either BankIt or Sequin:

  • BankIt
    • Check 'No' to answer the question 'Is This Primary Sequence Data?'.
    • Input list of Accession Numbers of all the primary sequences used to assemble or derive the submitted sequence.
    • Provide explanation of all experimental evidence.
    • Complete standard submission process, being sure to annotate all new descriptive information (CDS, protein name, gene name, etc) for the TPA sequence.
    • Sequence submission will be labeled as a TPA sequence and will be processed accordingly.
  • Sequin
    • Follow standard procedure for Sequin submission.
    • Choose Third Party Annotation from the Sequence Format window under Submission category
    • The Assembly Tracking box will appear with the flatfile display. The primary Accession Number(s) used to assemble/derive the TPA sequence should be entered into this box.
    • Click on Accept; new COMMENT field will appear in the flatfile, which will list the primary sequence Accession Numbers.
    • It is recommended that the submitter note in the email that contains the sequence submission that this is intended for TPA.

What should not be submitted to TPA:inferential

  • Sequences with annotation supported by experimental evidence. See TPA:experimental
  • Synthetic constructs such as cloning vectors that use well characterized, publicly available genes, promoters, or terminators; these should be submitted as synthetic sequences for GenBank.
  • Microsatellites and related types of repeat regions
  • New sequence updates or changes existing sequence data from another submitter; these should be submitted as new sequences for GenBank.
  • Annotation that has arisen from an automated tool, such as GeneMark, tRNA scan or ORF finder, where no further evidence, experimental or otherwise, is presented for the annotation.
  • Annotation from in vivo, in vitro, or in silico experimentation that will not be submitted for publication in a peer-reviewed journal.
