Identifiers in ClinVar

Accession numbers
- SCV
- RCV
- VCV
Identifiers specific to ClinVar
- Allele
- Variation
- Others
Identifiers specific to other NCBI resources
Identifiers specific to resources outside of NCBI

Accession numbers

ClinVar assigns accession numbers to its records. Accession numbers in ClinVar have the pattern of 3 letters and 9 numerals. The letters are:

SCV (Submitted record in ClinVar)
RCV (Reference ClinVar record - data aggregated by variant-condition pair)
VCV (Variation ClinVar record - data aggregated by variant)

Each accession number is assigned a version number.

The version on an SCV is incremented when the submitter updates the record.
- Most updates involve changes to data such as the classification or the date last evaluated.
- A submitter may update, or re-submit, a variant where no data changes except for the date of submission. This type of update also results in a version change.
The version on an RCV or VCV is incremented when the content of the record changes because of addition to, updates of, or deletion of the SCV accessions on which it is based.
- ClinVar staff edit data on RCV and VCV records, such as rs numbers and HGVS expressions, but these changes do not result in a version change.

SCV

Web display

'SCV' refers to the first 3 letters of the accession number assigned to a submitted record in ClinVar, e.g. SCV000020145.

If you submit a query to ClinVar based on that accession number, e.g. SCV000020145, you are directed automatically to the VCV page that includes that submitted record. The SCV accession number and version are displayed in one of the Submissions sections, either Submissions - Germline or Submissions - Somatic, as appropriate.

XML releases

The SCV accession is represented in the ClinVar XML files as follows.

ClinVarVCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/)

SCV accession /ClinicalAssertion/ClinVarAccession/@Accession
version for the SCV accession /ClinicalAssertion/ClinVarAccession/@Version

ClinVarRCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_release/)

SCV accession /ClinVarAssertion/ClinVarAccession/@Acc
version for the SCV accession /ClinVarAssertion/ClinVarAccession/@Version

ClinVarVariationRelease (https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/)

SCV accession /ClinicalAssertion/ClinVarAccession/@Accession
version for the SCV accession /ClinicalAssertion/ClinVarAccession/@Version

ClinVarFullRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/)

SCV accession /ClinVarAssertion/ClinVarAccession/@Acc
version for the SCV accession /ClinVarAssertion/ClinVarAccession/@Version

See the README file for more information about our XML files.

Files in the tab_delimited directory

There are two files in the tab-delimited directory on our ftp site that contain information specific to submitted records.

submission_summary.txt.gz
- an overview of classification, conditions, and observations reported in the current version of each submitted record
summary_of_conflicting_interpretations.txt
- all pairwise differences in classification of a variant, without regard to condition

VCF files

The SCV accession is not reported in ClinVar's VCF files.

RCV

Web display

'RCV' refers to the first 3 letters of the accession calculated by ClinVar to aggregate information from all submitted records for classifications of the same variant and condition.

If you submit a query to ClinVar based on an RCV accession, e.g. RCV000009910, you are directed to the page specific to that record. The Assertion and evidence details section of this record lists all the supporting submitted records with their SCV accessions.

XML releases

The RCV accession is represented in the ClinVar XML files as follows.

ClinVarVCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/)

RCV accession /RCVList/RCVAccession/@Accession
version for the RCV accession /RCVList/RCVAccession/@Version

ClinVarRCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_release/)

RCV accession /ReferenceClinVarAssertion/ClinVarAccession/@Acc
version for the RCV accession /ReferenceClinVarAssertion/ClinVarAccession/@Version

ClinVarVariationRelease (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/)

RCV accession //RCVList/RCVAccession/@Accession
version for the RCV accession //RCVList/RCVAccession/@Version

ClinVarFullRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/)

RCV accession /ReferenceClinVarAssertion/ClinVarAccession/@Acc
version for the RCV accession /ReferenceClinVarAssertion/ClinVarAccession/@Version

See the README file for more information about our XML releases.

Files in the tab_delimited directory

variant_summary includes a column RCVaccession which lists the RCV accessions for the variant.

VCV

Web display

'VCV' refers to the first 3 letters of the accession calculated by ClinVar to aggregate information from all submitted records for classifications of the same variant.

If you submit a query to ClinVar based on a VCV accession, e.g. VCV000009325, you are directed to the page specific to that record. The Submissions - Germline and Submissions - Somatic sections of this page list all the supporting submitted records with their SCV accessions.

XML releases

The VCV accession is represented in the ClinVar XML files as follows.

ClinVarVCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/)

VCV accession //ClinVarVariationRelease/VariationArchive/@Accession
version for the VCV accession //ClinVarVariationRelease/VariationArchive/@Version

ClinVarRCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_release/)

VCV accession //ReferenceClinVarAssertion/MeasureSet/@Acc
version for the VCV accession //ReferenceClinVarAssertion/MeasureSet/@Version

ClinVarVariationRelease, (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/)

VCV accession //ClinVarVariationRelease/VariationArchive/@Accession
version for the VCV accession //ClinVarVariationRelease/VariationArchive/@Version

ClinVarFullRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/)

VCV accession //ReferenceClinVarAssertion/MeasureSet/@Acc
version for the VCV accession //ReferenceClinVarAssertion/MeasureSet/@Version

See the README file for more information about our XML files.

VCF and files in the tab_delimited directory

VCV accessions are not reported in the VCF files or any of the tab-delimited files. Note that the VCV accession number is constructed using the Variation ID and prefixing VCV and a number of zeros to equal nine digits. So the Variation ID can be used in lieu of the VCV accession number; it is reported in the VCF files in the ID column and in the following tab-delimited files:

variant_summary.txt
var_citations.txt
summary_of_conflicting_interpretations.txt
hgvs4variation.txt.gz
variation_allele.txt
submission_summary.txt

See the README file for more information about our tab-delimited files.

Identifiers specific to ClinVar

Variation ID

ClinVar assigns a unique integer identifier to each set of variants described in submitted records. The majority of submitted records in ClinVar are for the classification of a single variant, and a Variation ID is assigned even if there is only one variant in the set.

There are two subclasses of Variation IDs:

for a variant that is classified directly (classified)
for a variant that is classified only included in a larger set of variants (included)

The majority of Variation IDs in ClinVar are for classified variants, meaning that they were the focus of a submitted record, with the classification of that variant provided by the submitter. However, there are submitted records that describe a compound heterozygote, a haploype, or a diplotype, for which ClinVar has not received independent submissions for the classification of each individual variant. The individual variants within the larger set, but without a direct classification, are represented by Variation IDs of the 'included' class.

Note the example of Variation ID 561, which represents a haplotype with three variants. Two of the variants in the haplotype, Allele IDs 38382 and 15600, do not have a submitted record in ClinVar directly classifying that variant, so each of those variants is considered "included". The third variant in the haplotype, Allele ID 38381, does have submitted records that directly classify the variant so it is considered a classified variant. The Variation IDs assigned to each individual variant (242755, 242821, 242756 respectively) are accessible in the XML files called ClinVarVCVRelease. The correspondences between Allele ID and Variation ID are in variation_allele.txt.

Web display

The Variation ID is used to anchor ClinVar's VCV page, which is for a set of variants (although most sets only have one variant in them). Take for example, Variation ID 561.

the value 561 in the URL https://www.ncbi.nlm.nih.gov/clinvar/variation/561/ is the Variation ID.
the value 561 has been assigned to the set of 3 distinct simple variants reported in the Allele(s) section.
The Variation ID is also displayed explicitly on the VCV page.

XML files

The Variation ID is represented in the ClinVar XML files as follows.

ClinVarVCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/)

/VariationArchive/@VariationID
/SimpleAllele/@VariationID
/Haplotype/@VariationID
/Genotype/@VariationID

ClinVarRCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_release/)

//MeasureSet/@ID

ClinVarVariationRelease (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/)

/VariationArchive/@VariationID
/SimpleAllele/@VariationID
/Haplotype/@VariationID
/Genotype/@VariationID

ClinVarFullRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/)

//MeasureSet/@ID

Reports in the tab_delimited directory

There are multiple files in the tab-delimited directory that reference the Variation ID. The column containing those values is clearly labeled. The file reporting the relationships between VariationID and AlleleID is variation_allele.txt.gz.

variant_summary.txt
var_citations.txt
summary_of_conflicting_interpretations.txt
hgvs4variation.txt.gz
variation_allele.txt
submission_summary.txt

VCF files

The Variation ID is reported in the VCF file as the ID, in column 3.

Allele ID

A unique integer identifier, the Allele ID, is assigned to each individual variant in ClinVar. The numbering systems for the Allele ID and the Variation ID described above overlap, so it is important to note the context of any integer identifier.

XML releases

The Allele ID is represented in the ClinVar XML files as follows.

ClinVarVCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/)

/SimpleAllele/@ID
/Haplotype/@ID
/Genotype/@ID

ClinVarRCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_release/)

//Measure/@ID

ClinVarVariationRelease (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/)

/SimpleAllele/@ID
/Haplotype/@ID
/Genotype/@ID

ClinVarFullRelease

//Measure/@ID

Files in the tab_delimited directory

There are multiple files in the tab-delimited directory that reference the Allele ID. The column containing the Allele ID is clearly labeled. The file reporting the relationships between VariationID and AlleleID is variation_allele.txt.gz.

allele_gene.txt
cross_references.txt
hgvs4variation.txt.gz
variant_summary.txt.gz
var_citations.txt
variation_allele.txt.gz

VCF files

The Allele ID is reported in the VCF files as the ALLELEID INFO tag.

Relationships between Variation ID and Allele ID

A Variation ID represents one or more Allele IDs. Any Allele ID may be a component of one or more sets of discrete variants (aka Variation ID). For example, consider a submission with a classification for a single nucleotide variant (Allele ID n, assigned Variation ID a), and a different submission with a classification for that same single nucleotide variant (Allele ID n) in combination with a different single nucleotide variant (Allele ID m), the combination being assigned Variation ID b. The correspondences between Variation ID and Allele ID are in variation_allele.txt.gz.

To find all Allele IDs represented by a Variation ID, try something like

zcat variation_allele.txt.gz | awk '$1==561 {print}'

which finds all the lines where the value in the first column (Variation ID) is 561, and reports that 561 corresponds to a Haplotype with a classification (yes in column 4), with the Allele IDs in the 3rd column.

561     Haplotype       15600   yes
561     Haplotype       38381   yes
561     Haplotype       38382   yes

To find all Variation IDs represented by an Allele ID, try something like

zcat variation_allele.txt.gz | awk '$3==15600{print}'

which finds all the lines where the value in the third column (Allele ID) is 15600, and reports that 15600 is part of a Haplotype with a classification (yes in column 4), and also is represented by Variation ID 242821 (a set with a single variant) which does not have a classification in ClinVar.

561     Haplotype       15600   yes
242821  Variant         15600   no

Other identifiers in ClinVar's XML files

ClinVar's XML files report integer @ID values for multiple elements other than MeasureSet and Measure. These values correspond to the unique keys used in the relational database tables that ClinVar uses to represent the data. At present these values can be used for identification in processing any element from one report to another, e.g. //Trait/@ID, but ClinVar does not consider these as public identifiers and reserves the right to alter the numbering system.

Identifiers specific to other NCBI resources

ClinVar maintains multiple identifiers to other NCBI resources. These include the BookShelf, dbSNP, dbVar, Gene, MedGen's CUI, PubMed, and PubMedCentral.

In the XML, these are reported in the XRef element.
In the tab-delimited directory, these are reported in
- cross-references.txt
- var_citations.txt

Identifiers specific to resources outside of NCBI

ClinVar maintains multiple identifiers to resources outside of NCBI.

In the XML, these are reported in the XRef element.
In the tab-delimited directories, these are reported in
- cross-references.txt
- var_citations.txt
In the main clinvar directory, gene_condition_source_id reports gene-disease relationships used in ClinVar, Gene, GTR and MedGen

ClinVar

Relating variation to medicine

Identifiers in ClinVar

Accession numbers

SCV

Web display

XML releases

Files in the tab_delimited directory

VCF files

RCV

Web display

XML releases

Files in the tab_delimited directory

VCV

Web display

XML releases

VCF and files in the tab_delimited directory

Identifiers specific to ClinVar

Variation ID

Web display

XML files

Reports in the tab_delimited directory

VCF files

Allele ID

XML releases

Files in the tab_delimited directory

VCF files

Relationships between Variation ID and Allele ID

Other identifiers in ClinVar's XML files

Identifiers specific to other NCBI resources

Identifiers specific to resources outside of NCBI