New Evidence Qualifiers  
PubMed Entrez BLAST OMIM Books TaxBrowser Structure


spacer SITE MAP
Guide to NCBI resources


GenBank
Sequence submission support and software

WGS
Whole Genome Shotgun Sequences

Bacterial Genomes Submission Guidelines

At the annual meeting of the International Nucleotide Sequence Databases (INSD), DDBJ, EMBL and GenBank agreed to adopt two new qualifiers to describe the evidence for features in sequence records. These are "/experimental=text" and "/inference=TYPE:text", where 'TYPE' is from a select list and 'text' is structured text (described below). These new qualifiers replace "evidence=experimental" and "evidence=non-experimental", respectively, which will no longer be supported.

/experiment

Definition: a brief description of the nature of the experimental evidence that supports the feature identification or assignment.

Value format: "text"

Examples:
/experiment="Northern blot"
/experiment="heterologous expression system of Xenopus laevis oocytes"

Comment: detailed experimental details should not be included, and would normally be found in the cited publications.

/inference

Definition: a structured description of non-experimental evidence that supports the feature identification or assignment.

Value format: "TYPE[ (same species)][:EVIDENCE_BASIS]" where TYPE is restricted to one of the following 11 choices:

/inference="similar to sequence"
/inference="similar to AA sequence"
/inference="similar to DNA sequence"
/inference="similar to RNA sequence"
/inference="similar to RNA sequence, mRNA"
/inference="similar to RNA sequence, EST"
/inference="similar to RNA sequence, other RNA"
/inference="profile"
/inference="nucleotide motif"
/inference="protein motif"
/inference="ab initio prediction"
  • The optional text "(same species)" can be included when the inference comes from the same species as the entry.
  • The "EVIDENCE_BASIS" is text that gives a reference to a database entry (including accession and version) or an algorithm (including version). The accession.version number of a database record and the version number of an algorithm are separated from the database or algorithm name by a colon, as seen in the examples.
  • Examples:
    /inference="similar to DNA sequence:INSD:AY411252.1"
    /inference="similar to RNA sequence, mRNA:RefSeq:NM_000041.2"
    /inference="similar to DNA sequence (same species):INSD:AACN010222672.1"
    /inference="profile:tRNAscan:2.1"
    /inference="protein motif:InterPro:IPR001900"
    /inference="ab initio prediction:Genscan:2.0"

    Several things to note about /inference are:

    • When citing a GenBank record, use INSD (International Sequence Database).
    • When citing a RefSeq record (recognized by the underscore between the letters and the digits), use RefSeq.
    • Include the version of the algorithm that was used, and separate the version from the algorithm name with a colon, eg Genscan:2.0.

    Old evidence qualifiers

    Previous instances of "evidence=experimental" and "evidence=non-experimental" will be automatically converted to

    /experiment="experimental evidence, no additional details recorded"
    /inference="non-experimental evidence, no additional details recorded"

    but these phrases may not be used on new records.

    How to add the new qualifiers to your genome submissions

    New versions of tbl2asn and Sequin support these new qualifiers

     

    To use the new qualifiers in tbl2asn, include them in the .tbl file as you do other qualifiers.

    Example .tbl file:

    In this example the first CDS is predicted by Genscan 2.0, the second CDS was identified by its similarity to EST H22345.1 from the same species, the third CDS was identified because it's similar to GenBank (INSD) record AY123456.2 and by its InterPro domain IPR001900, and the fourth CDS has experimental expression evidence.

     

    >Feature  ExampleSeq
    
    1     100   gene  
                      locus_tag   Test_0001
    1     100   CDS
                      product     Test_0001
                      protein_id  gnl|center_name|Test_0001
                      inference   ab initio prediction:Genscan:2.0
    200   300   gene
                      locus_tag   Test_0002
    200   300   CDS
                      product     Test_0002
                      protein_id  gnl|center_name|Test_0002
                      inference   similar to RNA sequence, EST (same species):INSD:H22345.1
    400   500   gene  
                      locus_tag   Test_0003
    400   500   CDS
                      product     Test_0003
                      protein_id  gnl|center_name|Test_0003
                      inference   similar to RNA sequence, mRNA:INSD:AY123456.2
                      inference   protein motif:InterPro:IPR001900
    600   700   gene  
                      locus_tag   Test_0004
    600   700   CDS
                      product     Test_0004
                      protein_id  gnl|center_name|Test_0004
                      experiment  expression of GST fusion protein                
    
    The resulting flatfile looks like this:
    
         gene            1..100
                         /locus_tag="Test_0001"
         CDS             1..100
                         /locus_tag="Test_0001"
                         /inference="ab initio prediction:Genscan:2.0"
                         /codon_start=1
                         /product="Test_0001"
                         /translation="M...."
         gene            200..300
                         /locus_tag="Test_0002"
         CDS             200..300
                         /locus_tag="Test_0002"
                         /inference="similar to RNA sequence, EST (same
                         species):INSD:H223456.1"
                         /codon_start=1
                         /product="Test_0002"
                         /translation="M...."
         gene            400..500
                         /locus_tag="Test_0003"
         CDS             400..500
                         /locus_tag="Test_0003"
                         /inference="protein motif:InterPro:IPR001900"
                         /inference="similar to RNA sequence, mRNA:INSD:AY123456.2"
                         /codon_start=1
                         /product="Test_0003"
                         /translation="M...."
         gene            600..700
                         /locus_tag="Test_0004"
         CDS             600..700
                         /locus_tag="Test_0004"
                         /experiment="expression of GST fusion protein"
                         /codon_start=1
                         /product="Test_0004"
                         /translation="M...."