Format of Sequence Record: User Question and Answer
Course Home Modules Schedule Exercises Comments Credits

Compare Accession, Version, and GI number

  Sample User Question Answer Comments/Analysis Additional Tips  

Sample User Question back to top

 
What is the difference between accession number, version, and GI number in a sequence record? When would I use one versus the other to retrieve a record?
 

Answer back to top

  • What is the difference? (open U49845 in a separate window as an example)

    An accession number (U49845) applies to the complete database record and remains stable even if updates/revisions are made to the record. The version number (U49845.1) and GI (1293613) are unique identifiers for the sequence data within a record. If any change occurs to the sequence data, no matter how large or small, the version number for that sequence is incremented by one decimal and a new GI number is assigned.

    • Each string of sequence data within a database record receives both a version number and a GI number.

      • For example, the GenBank sequence record U49845 contains one nucleotide sequence and three amino acid translations (one for each coding sequence, or CDS, feature that is annotated on the nucleotide sequence). The version and GI number given at the top of the record (in the Version field) applies to the current nucleotide sequence. Each amino acid translation also has version and gi number of its own.

    • The GenBank Sample record (will open in a separate window) provides more detail about each of these identification numbers. It also provides a detailed description of each field in a sequence record.
  • When to use one number rather than the other?

    In the Entrez sequence databases - search with an accession number to retrieve the most recent version of a sequence record, and search with a version number or gi number to retrieve a copy of the record that has a specific version of the sequence data. For example:

    • in the Entrez CoreNucleotide query box, enter AF119666 to retrieve the most recent version of the "Homo sapiens insulin receptor tyrosine kinase substrate mRNA, complete cds."

      • The Locus field shows that sequence record currently has 2527 bp. The nucleotide sequence currently has a GI number of 34223696 and a version number of AF119666.2
      • The Comment field lists the earlier GI number, 6563257, and provides a link to that older version of the sequence record, with 2120 bp. If you follow the link to the older record, its version number is shown as AF119666.1. The Comment field of that record has a warning indicating the record was replaced with a newer version. in this way, you can go back and forth between old and new versions of sequence data.

    • to retrieve the older version of the record directly, without first retrieving the current version, enter the old GI number (6563257) or the old version number (AF119666.1) in the Entrez Nucleotide query box.

Comments / Analysis back to
top

Sequence records are dynamic. Submitters can update many different types of information in their records.

If some types of changes are made to a record, such as the addition of a reference or biological feature, the correction of a misspelling, etc., only the modification date (in the upper right corner of the sequence record) is changed. The accession number remains the same. In such cases, only the most recent version of the record is available in the database, with no cross-reference or link to an earlier version.

If any change is made to the sequence data, both the version number and GI number change. The accession number remains stable, as always. The change in sequence data can be small (a single base pair is changed or added) or large. The Comment field of the old and new records will cross reference each other, allowing you to retrieve the current or earlier versions of the sequence record.

It is helpful for users to be aware of newer and older versions of a sequence. For example, if you use an old gi number to retrieve a sequence that was later revised, the old version of the record will have a warning in the comment field and will link to the newer version. A more specific example:
Let's say you did a BLAST search and retrieved GI number 6563257 as a top hit. A few months later, the submitter of that sequence record added more bases to the sequence. You look at your BLAST results again (not realizing that the sequence has since changed) and retrieve the sequence using the original gi number (6563257) that was reported in the search results. You retrieve the older version of the record, but the Comment field contains a warning that a newer version of the sequence exists. You follow the link to the newer version and see that it provides additional, new sequence data and revised biological annotations.

Conversely, if you simply retrieve the BLAST hit by accession number (AF119666), you will retrieve the newest version. However, if you are re-evaluating your original BLAST search results, you might in fact need to retrieve the exact sequence data that was found by your earlier BLAST search. You can retrieve the older sequence either by following the Comment field's link to the old GI number, or by searching for that old gi number directly.

Additional Tips back to
top

  • GenBank Sample record - provides the answer to this question. It has a detailed description of each field in a sequence record, and also provides detailed descriptions of some of the important data elements
    • click on the "Accession" and "Version" field labels to see a description of each field
    • click on "GI" in the version field to see a description of that data element

  • Sequence Revision History Tool - allows you to see the various gi numbers, version numbers, and update dates for sequences that appeared in a specific GenBank record


Format of Sequence Record Return to Slides
Return to Exercises List
Revised 11/01/2007