Compare Accession, Version, and GI number
|
| Sample User Question |
 |
| |
|
What is the difference between accession number, version, and GI number
in a
sequence record?
When would I use one versus the other to retrieve a record?
|
|
|
| Answer |
 |
- What is the difference? (open U49845 in a separate window as an
example)
An accession number (U49845) applies to the complete database record
and remains stable even if updates/revisions are made to the record. The
version number (U49845.1) and GI (1293613) are unique identifiers
for the sequence data within a record. If any change occurs to the
sequence data, no matter how large or small, the version number for that sequence
is incremented by one decimal and a new GI number is assigned.
- Each string of sequence data within a database record receives
both a version number and a GI number.
- For example, the GenBank sequence record U49845 contains one
nucleotide sequence and three amino acid translations (one for each
coding sequence, or CDS, feature that is annotated on the nucleotide sequence).
The version and GI number given at the top of the record (in the Version
field) applies to the current nucleotide sequence. Each amino acid
translation also has version and gi number of its own.
- The GenBank Sample record (will open in a separate window)
provides more detail about each of these identification numbers. It also provides
a detailed description of each field in a sequence record.
- When to use one number rather than the other?
In the Entrez sequence databases - search with an accession
number to retrieve the most recent version of a sequence record, and search
with a version number or gi number to retrieve a copy of the record that has a
specific version of the sequence data. For example:
- in the Entrez CoreNucleotide query box, enter AF119666 to
retrieve the most recent version of the "Homo sapiens insulin receptor
tyrosine kinase substrate mRNA, complete cds."
- The Locus field shows that sequence record currently has 2527
bp. The nucleotide sequence currently has a GI number of 34223696 and
a version number of AF119666.2
- The Comment field lists the earlier GI number, 6563257,
and provides a link to that older version of the sequence record, with 2120
bp. If you follow the link to the older record, its version number is shown
as AF119666.1. The Comment field of that record has a warning indicating
the record was replaced with a newer version. in this way, you can go back and
forth between old and new versions of sequence data.
- to retrieve the older version of the record directly, without first
retrieving the current version, enter the old GI number (6563257) or the old
version number (AF119666.1) in the Entrez Nucleotide query box.
|
| Comments / Analysis |
 |
Sequence records are dynamic. Submitters can update many different types of
information in their records.
If some types of changes are made to a record, such as the addition of a reference
or biological feature, the correction of a misspelling, etc., only the
modification date (in the upper right corner of the sequence record) is changed.
The accession number remains the same. In such cases, only the most recent
version of the record is available in the database, with no cross-reference or
link to an earlier version.
If any change is made to the sequence data, both the version number and GI number
change.
The accession number remains stable, as always. The change in sequence data can
be small (a single base pair is changed or added) or large. The Comment field of
the old and new records will cross reference each other, allowing you to retrieve
the current or earlier versions of the sequence record.
It is helpful for users to be aware of newer and older versions of a sequence.
For example, if you use an old gi number to retrieve a sequence that was later
revised, the old version of the record will have a warning in the comment field
and will link to the newer version. A more specific example:
Let's say you did a BLAST search and retrieved GI number 6563257 as a top hit.
A few months later, the submitter of that sequence record added more bases to the
sequence. You look at your BLAST results again (not realizing that the sequence
has since changed) and retrieve the sequence using the original gi number
(6563257) that was reported in the search results. You retrieve the older version
of the record, but the Comment field contains a warning that a newer version of
the sequence exists. You follow the link to the newer version and see that it
provides additional, new sequence data and revised biological annotations.
Conversely, if you simply retrieve the BLAST hit by accession number
(AF119666), you will retrieve the newest version. However, if you are
re-evaluating your original BLAST search results, you might in fact need to
retrieve the exact sequence data that was found by your earlier BLAST search. You
can retrieve the older sequence either by following the Comment field's link to
the old GI number, or by searching for that old gi number directly.
|
| Additional Tips |
 |
- GenBank Sample record - provides the answer to this
question. It has a detailed description of each field in a sequence record, and
also provides detailed descriptions of some of the important data
elements
- click on the "Accession" and "Version" field labels to see a
description of each field
- click on "GI" in the version field to see a description of that
data element
- Sequence Revision History Tool - allows you to see the
various gi numbers, version numbers, and update dates for sequences that appeared
in a specific GenBank record
|
|