Identifiers in ClinVar
- Accession numbers
- Identifiers specific to ClinVar
- Identifiers specific to other NCBI resources
- Identifiers specific to resources outside of NCBI
Accession numbers
ClinVar assigns accession numbers to its records. Accession numbers in ClinVar have the pattern of 3 letters and 9 numerals. The letters are:
- SCV (Submitted record in ClinVar)
- RCV (Reference ClinVar record - data aggregated by variant-condition pair)
- VCV (Variation ClinVar record - data aggregated by variant)
Each accession number is assigned a version number.
- The version on an SCV is incremented when the submitter updates the record.
- Most updates involve changes to data such as the classification or the date last evaluated.
- A submitter may update, or re-submit, a variant where no data changes except for the date of submission. This type of update also results in a version change.
- The version on an RCV or VCV is incremented when the content of the record changes because of addition to, updates of, or deletion of the SCV accessions on which it is based.
- ClinVar staff edit data on RCV and VCV records, such as rs numbers and HGVS expressions, but these changes do not result in a version change.
SCV
Web display
'SCV' refers to the first 3 letters of the accession number assigned to a submitted record in ClinVar, e.g. SCV000020145.
If you submit a query to ClinVar based on that accession number, e.g. SCV000020145, you are directed automatically to the VCV page that includes that submitted record. The SCV accession number and version are displayed in one of the Submissions sections, either Submissions - Germline or Submissions - Somatic, as appropriate.
XML releases
The SCV accession is represented in the ClinVar XML files as follows.
ClinVarVCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/)
- SCV accession /ClinicalAssertion/ClinVarAccession/@Accession
- version for the SCV accession /ClinicalAssertion/ClinVarAccession/@Version
ClinVarRCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_release/)
- SCV accession /ClinVarAssertion/ClinVarAccession/@Acc
- version for the SCV accession /ClinVarAssertion/ClinVarAccession/@Version
ClinVarVariationRelease (https://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/clinvar_variation/)
- SCV accession /ClinicalAssertion/ClinVarAccession/@Accession
- version for the SCV accession /ClinicalAssertion/ClinVarAccession/@Version
ClinVarFullRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/)
- SCV accession /ClinVarAssertion/ClinVarAccession/@Acc
- version for the SCV accession /ClinVarAssertion/ClinVarAccession/@Version
See the README file for more information about our XML files.
Files in the tab_delimited directory
There are two files in the tab-delimited directory on our ftp site that contain information specific to submitted records.
- submission_summary.txt.gz
- an overview of classification, conditions, and observations reported in the current version of each submitted record
- summary_of_conflicting_interpretations.txt
- all pairwise differences in classification of a variant, without regard to condition
VCF files
The SCV accession is not reported in ClinVar's VCF files.
RCV
Web display
'RCV' refers to the first 3 letters of the accession calculated by ClinVar to aggregate information from all submitted records for classifications of the same variant and condition.
If you submit a query to ClinVar based on an RCV accession, e.g. RCV000009910, you are directed to the page specific to that record. The Assertion and evidence details section of this record lists all the supporting submitted records with their SCV accessions.
XML releases
The RCV accession is represented in the ClinVar XML files as follows.
ClinVarVCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/)
- RCV accession /RCVList/RCVAccession/@Accession
- version for the RCV accession /RCVList/RCVAccession/@Version
ClinVarRCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_release/)
- RCV accession /ReferenceClinVarAssertion/ClinVarAccession/@Acc
- version for the RCV accession /ReferenceClinVarAssertion/ClinVarAccession/@Version
ClinVarVariationRelease (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/)
- RCV accession //RCVList/RCVAccession/@Accession
- version for the RCV accession //RCVList/RCVAccession/@Version
ClinVarFullRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/)
- RCV accession /ReferenceClinVarAssertion/ClinVarAccession/@Acc
- version for the RCV accession /ReferenceClinVarAssertion/ClinVarAccession/@Version
See the README file for more information about our XML releases.
Files in the tab_delimited directory
variant_summary includes a column RCVaccession which lists the RCV accessions for the variant.
VCV
Web display
'VCV' refers to the first 3 letters of the accession calculated by ClinVar to aggregate information from all submitted records for classifications of the same variant.
If you submit a query to ClinVar based on a VCV accession, e.g. VCV000009325, you are directed to the page specific to that record. The Submissions - Germline and Submissions - Somatic sections of this page list all the supporting submitted records with their SCV accessions.
XML releases
The VCV accession is represented in the ClinVar XML files as follows.
ClinVarVCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/)
- VCV accession //ClinVarVariationRelease/VariationArchive/@Accession
- version for the VCV accession //ClinVarVariationRelease/VariationArchive/@Version
ClinVarRCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_release/)
- VCV accession //ReferenceClinVarAssertion/MeasureSet/@Acc
- version for the VCV accession //ReferenceClinVarAssertion/MeasureSet/@Version
ClinVarVariationRelease, (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/)
- VCV accession //ClinVarVariationRelease/VariationArchive/@Accession
- version for the VCV accession //ClinVarVariationRelease/VariationArchive/@Version
ClinVarFullRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/)
- VCV accession //ReferenceClinVarAssertion/MeasureSet/@Acc
- version for the VCV accession //ReferenceClinVarAssertion/MeasureSet/@Version
See the README file for more information about our XML files.
VCF and files in the tab_delimited directory
VCV accessions are not reported in the VCF files or any of the tab-delimited files. Note that the VCV accession number is constructed using the Variation ID and prefixing VCV and a number of zeros to equal nine digits. So the Variation ID can be used in lieu of the VCV accession number; it is reported in the VCF files in the ID column and in the following tab-delimited files:
- variant_summary.txt
- var_citations.txt
- summary_of_conflicting_interpretations.txt
- hgvs4variation.txt.gz
- variation_allele.txt
- submission_summary.txt
See the README file for more information about our tab-delimited files.
Identifiers specific to ClinVar
Variation ID
ClinVar assigns a unique integer identifier to each set of variants described in submitted records. The majority of submitted records in ClinVar are for the classification of a single variant, and a Variation ID is assigned even if there is only one variant in the set.
There are two subclasses of Variation IDs:
- for a variant that is classified directly (classified)
- for a variant that is classified only included in a larger set of variants (included)
The majority of Variation IDs in ClinVar are for classified variants, meaning that they were the focus of a submitted record, with the classification of that variant provided by the submitter. However, there are submitted records that describe a compound heterozygote, a haploype, or a diplotype, for which ClinVar has not received independent submissions for the classification of each individual variant. The individual variants within the larger set, but without a direct classification, are represented by Variation IDs of the 'included' class.
Note the example of Variation ID 561, which represents a haplotype with three variants. Two of the variants in the haplotype, Allele IDs 38382 and 15600, do not have a submitted record in ClinVar directly classifying that variant, so each of those variants is considered "included". The third variant in the haplotype, Allele ID 38381, does have submitted records that directly classify the variant so it is considered a classified variant. The Variation IDs assigned to each individual variant (242755, 242821, 242756 respectively) are accessible in the XML files called ClinVarVCVRelease. The correspondences between Allele ID and Variation ID are in variation_allele.txt.
Web display
The Variation ID is used to anchor ClinVar's VCV page, which is for a set of variants (although most sets only have one variant in them). Take for example, Variation ID 561.
- the value 561 in the URL https://www.ncbi.nlm.nih.gov/clinvar/variation/561/ is the Variation ID.
- the value 561 has been assigned to the set of 3 distinct simple variants reported in the Allele(s) section.
- The Variation ID is also displayed explicitly on the VCV page.
XML files
The Variation ID is represented in the ClinVar XML files as follows.
ClinVarVCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/)
- /VariationArchive/@VariationID
- /SimpleAllele/@VariationID
- /Haplotype/@VariationID
- /Genotype/@VariationID
ClinVarRCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_release/)
- //MeasureSet/@ID
ClinVarVariationRelease (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/)
- /VariationArchive/@VariationID
- /SimpleAllele/@VariationID
- /Haplotype/@VariationID
- /Genotype/@VariationID
ClinVarFullRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_xml_old_format/)
- //MeasureSet/@ID
Reports in the tab_delimited directory
There are multiple files in the tab-delimited directory that reference the Variation ID. The column containing those values is clearly labeled. The file reporting the relationships between VariationID and AlleleID is variation_allele.txt.gz.
- variant_summary.txt
- var_citations.txt
- summary_of_conflicting_interpretations.txt
- hgvs4variation.txt.gz
- variation_allele.txt
- submission_summary.txt
VCF files
The Variation ID is reported in the VCF file as the ID, in column 3.
Allele ID
A unique integer identifier, the Allele ID, is assigned to each individual variant in ClinVar. The numbering systems for the Allele ID and the Variation ID described above overlap, so it is important to note the context of any integer identifier.
XML releases
The Allele ID is represented in the ClinVar XML files as follows.
ClinVarVCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/)
- /SimpleAllele/@ID
- /Haplotype/@ID
- /Genotype/@ID
ClinVarRCVRelease (ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/RCV_release/)
- //Measure/@ID
ClinVarVariationRelease (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/VCV_xml_old_format/)
- /SimpleAllele/@ID
- /Haplotype/@ID
- /Genotype/@ID
ClinVarFullRelease
- //Measure/@ID
Files in the tab_delimited directory
There are multiple files in the tab-delimited directory that reference the Allele ID. The column containing the Allele ID is clearly labeled. The file reporting the relationships between VariationID and AlleleID is variation_allele.txt.gz.
- allele_gene.txt
- cross_references.txt
- hgvs4variation.txt.gz
- variant_summary.txt.gz
- var_citations.txt
- variation_allele.txt.gz
VCF files
The Allele ID is reported in the VCF files as the ALLELEID INFO tag.
Relationships between Variation ID and Allele ID
A Variation ID represents one or more Allele IDs. Any Allele ID may be a component of one or more sets of discrete variants (aka Variation ID). For example, consider a submission with a classification for a single nucleotide variant (Allele ID n, assigned Variation ID a), and a different submission with a classification for that same single nucleotide variant (Allele ID n) in combination with a different single nucleotide variant (Allele ID m), the combination being assigned Variation ID b. The correspondences between Variation ID and Allele ID are in variation_allele.txt.gz.
To find all Allele IDs represented by a Variation ID, try something like
zcat variation_allele.txt.gz | awk '$1==561 {print}'
which finds all the lines where the value in the first column (Variation ID) is 561, and reports that 561 corresponds to a Haplotype with a classification (yes in column 4), with the Allele IDs in the 3rd column.
561 Haplotype 15600 yes
561 Haplotype 38381 yes
561 Haplotype 38382 yes
To find all Variation IDs represented by an Allele ID, try something like
zcat variation_allele.txt.gz | awk '$3==15600{print}'
which finds all the lines where the value in the third column (Allele ID) is 15600, and reports that 15600 is part of a Haplotype with a classification (yes in column 4), and also is represented by Variation ID 242821 (a set with a single variant) which does not have a classification in ClinVar.
561 Haplotype 15600 yes
242821 Variant 15600 no
Other identifiers in ClinVar's XML files
ClinVar's XML files report integer @ID values for multiple elements other than MeasureSet and Measure. These values correspond to the unique keys used in the relational database tables that ClinVar uses to represent the data. At present these values can be used for identification in processing any element from one report to another, e.g. //Trait/@ID, but ClinVar does not consider these as public identifiers and reserves the right to alter the numbering system.
Identifiers specific to other NCBI resources
ClinVar maintains multiple identifiers to other NCBI resources. These include the BookShelf, dbSNP, dbVar, Gene, MedGen's CUI, PubMed, and PubMedCentral.
- In the XML, these are reported in the XRef element.
- In the tab-delimited directory, these are reported in
- cross-references.txt
- var_citations.txt
Identifiers specific to resources outside of NCBI
ClinVar maintains multiple identifiers to resources outside of NCBI.
- In the XML, these are reported in the XRef element.
- In the tab-delimited directories, these are reported in
- cross-references.txt
- var_citations.txt
- In the main clinvar directory, gene_condition_source_id reports gene-disease relationships used in ClinVar, Gene, GTR and MedGen