FAQ about using ClinVar and understanding its data display

Please see our submission FAQ for questions about the submission process.

How to search ClinVar

  1. How do I search ClinVar efficiently?
  2. How do I retrieve a list of variants in a gene in order of location?
  3. What groups of records does ClinVar precalculate?
  4. How can I sort results in gene-specific order?
  5. How can I retrieve batches of data from ClinVar?
  6. ClinVar doesn't have much data for my variant of interest. Where can I find more information?
  7. When I query ClinVar, sometimes the the counts for the clinical significance filters don't match the data in the results table. What does this mean?
  8. How can I find other variants at a location, or other variants that cause the same protein change?

Using the web display

  1. What do SCV and RCV mean?
  2. Why are there different assertions of clinical significance for the same variant?
  3. Why are there multiple RCV accessions in ClinVar for the same variant?
  4. I'm interested in a variant that was reported with the condition "not specified". What does that mean?
  5. What does it mean if a ClinVar record is "criteria provided, single submitter" but has more than one submission?
  6. I think a variant in ClinVar has the wrong clinical significance. What should I do about that?
  7. I see multiple dates on a ClinVar record. What do they mean?
  8. How can I tell if a record became public on the ClinVar website after the monthly VCF file was created?
  9. When a variant may lie in multiple genes, what does ClinVar report?
  10. A variant in ClinVar is described as a 1-nt deletion in the genomic DNA, but a 60-nt deletion in the mRNA. Is that an error?
  11. I followed a link to the Breast Cancer Information Core (BIC), but it did not work. What do I do?
  12. A variant in ClinVar is described as "suspect". What does this mean?
  13. My browser does not display the pdf ile documenting Assertion Method. What do I do?

Data sources and processing

  1. What data sources are included in ClinVar?
  2. Why don't ClinVar records include HGMD identifiers?
  3. Why do some variants in ClinVar have no genomic location?
  4. Is a new version number assigned for any change in an SCV record?
  5. When there are multiple RefSeqs for a gene, does ClinVar select a subset to use for reporting?
  6. What is ClinVar's convention for representing the location of variation with length differences when there are multiple options, left or right justified?
  7. I included specific ages and geographic origins for individuals in my submission; why are they reported as a range and a larger geographic region?

Reports

  1. Why doesn't the VCF file contain all the data in the XML file?
  2. Where can I find statistics about the number of ClinVar submissions?
  3. On the submitter summary page, why is the number of submissions sometimes different from the number of records from that submitter?

Citing ClinVar

  1. How should I refer to a ClinVar record in written reports?
  2. How should I reference ClinVar?

How to search ClinVar

As documented in more detail here, ClinVar can be searched with terms like

Note that by default, searching uses the exact search terms provided; for example, searching for "Noonan" finds records that include the word Noonan but does not find records with the word "Noonan's". Consider doing a wild-card search like "Noonan*" if you want to expand your search. Also, ClinVar queries search all fields of data by default. More information on how to narrow your query by searching particular fields in available in the ClinVar help document. If you have favorite queries that you will do periodically, you can login to MyNCBI and save your searches. Saved searches can be run on-the-fly or you can receive regular email updates with results of the search.

How do I retrieve a list of variants in a gene in order of location?

There are several options. You can search in ClinVar:

  • search by gene symbol
  • results are returned in order of ascending genomic location
    • variants in genes on the plus strand of the chromosome are in ascending order
    • variants in genes on the minus strand of the chromosome are in descending order; at this time we do not have an option to make these variants sort in the opposite order
    • use the "Send to" in the upper right of the page to save the results to a file
    • You can start in Variation Viewer:

      • search by gene symbol
      • use the filters in the lower left to limit to variants in ClinVar (click "Yes")
      • in the table of results, click "Download" in the upper left to save the results to a file; note that these results are limited to variants that have been mapped on the genome sequence

      Or you can get the data from ftp site. You can look in  the VCF file or the variant_summary.txt file for your gene of interest and parse the protein expressions to order by protein location.

       

      What groups of records does ClinVar precalculate?

      Recurrent concepts in ClinVar are captured in what are termed properties in NCBI's Entrez system. These properties are created to facilitate finding data chararacterized by standard values. Some of these properties are exposed as the filters you see on the result set, but there are many more. You can scan the names of properties and the counts of records with each property by using the advanced query option: www.ncbi.nlm.nih.gov/clinvar/advanced/.

      • Select Properties from the Builder/All fields menu.
      • Click on Show index list to the righ of that menu.

      Descriptions of each property, with sample queries, are provided in this document.

      How can I sort results in gene-specific order?

      When a gene is on the negative strand, the location of a variation relative to the gene sorts in opposite order to the location on the genome. The position column on ClinVar's tabular display is the chromosome location, so the ordering seems counterintuitive. At present we do not provide a method to sort in the opposite order.

      How can I retrieve batches of data from ClinVar?

      ClinVar does not currently support a batch query interface, but there are several approaches that might still meet your needs:

      Use case Possible solutions
      Approaches to process batches of data
      Variant-specific data for a list of genes

      Query ClinVar by listing the genes using the boolean OR, and download the results interactively or using e-utilities

      http://www.ncbi.nlm.nih.gov/clinvar?term=spred1[gene]%20OR%20shoc2[gene]%20OR%20raf1[gene]%20OR%20ptpn11[gene]

      Download the file ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/variant_summary.txt.gz and process to extract gene-specific lines.

      Download the full data extract ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/xml/ClinVarFullRelease_00-latest.xml.gz and process to extract gene-specific records.

      Variant-specific data for a list of conditions Use the same approach as for genes, but use instead a list of identifiers such as MIM numbers or Concept UIDs (CUI) from MedGen.  The file variant_summary.txt reports identifiers for phenoytpes, but not their names.
      Gene-disease relationships

      ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/gene_condition_source_id reports gene-disease relationships used in ClinVar with attribution to the source.  Current sources are OMIM, GeneReviews, and some from NCBI staff curation.  This report does include all genes with variants asserted to result in a disease and submitted to ClinVar.

      ClinVar doesn't have much data for my variant of interest. Where can I find more information?

      If ClinVar can tell you are searching for a variant, the search results include this text:

      You may also find information on this variant by searching: All NCBI Databases, Google

      with links to search other NCBI databases or Google for that variant.

      Also note that submissions are not necessarily comprehensive; the fact that a laboratory has not submitted a variant does not necessarily mean that they have not seen the variant. Laboratories may submit their data to ClinVar in batches, so your variant may be submitted to ClinVar later. Consider setting up a saved search in your NCBI account for variants that you want to follow.

      When I query ClinVar, sometimes the the counts for the clinical significance filters don't match the data in the results table. What does this mean?

      The results of a ClinVar search are for variants. However, the clinical significance filters are based on the clinical significance reported for each submission, or SCV; the exception is the "conflicting interpretations" filter which is based on the clinical significance for the variant. A variant may have multiple SCVs with different clinical significance values. So for example, a single variant with one SCV reporting "pathogenic" and a second SCV reporting "Benign" would be counted in three filters - Pathogenic, Benign, and Conflicting interpretations.

      How can I find other variants at a location, or other variants that cause the same protein change?

      Variation Viewer is recommended for searching based on location. From a ClinVar record, click on the Variation Viewer link in the Browser views section on the right-hand side. This browser shows all the variation in ClinVar, dbSNP, and dbVar in the region of the variant of interest. Other variants that cause the same protein change can be searched using the protein HGVS expression, or by querying for the gene symbol and the one- or three-letter protein change.

      Using the web display

      What do SCV and RCV mean?

      Each individual submission to ClinVar, defined by the combination of variant, condition, and submitter, receives an accession number  with the prefix SCV ("submission to ClinVar"). The accession number is also versioned, so that updates that the submitter makes to their record are tracked. ClinVar aggregates SCV records for the same combination of variant and condition into a "reference ClinVar" record, which receives an accession number with the prefix RCV. The RCV record can be viewed on the web, by clicking the "See supporting ClinVar records" link on the left side of the variation page.

      Why are there different assertions of clinical significance for the same variant?

      ClinVar is an archive for assertions of clinical significance made by our submitters. If multiple groups have reported different values for clinical significance for the same variant, we report that there is a conflict and show all of the submitted values for clinical significance. These records have "conflicting data from submitters" as the clinical significance and the review status; the review status is also represented as 0/4 stars to indicate the lack of certainty in the interpretation of the variant. ClinVar does not arbitrate and resolve these conflicts. However, if we have a submission for that variant from an expert panel or a professional society, the assertion made by the expert panel or professional society is displayed and different interpretations from other submitters are not reported as conflicts.

      Why are there multiple RCV accessions in ClinVar for the same variant?

      An accession in ClinVar is based on a variant-condition combination, not the variant alone.  This representation was selected so that distinct accessions could be assigned to variants that result in distinct disorders.  Each submitted interpretation is assigned an accession of the format SCV000000000.0 and versioned if the submitter updates a record (e.g. SCV000000001.1 would be updated to SCV000000001.2). Each unique combination of variant-condition relationship is aggregated into a ClinVar record with an accession of the format RCV000000000.0.

      In these early days of ClinVar, when there is less consensus about how to describe the clinical condition that results from variation, we recognize there are often multiple RCV accessions assigned to the same variant. We anticipate this will change over time as expert panels review the data and decide how to describe the condition.  When updates result in matching conditions, the RCV accessions will be merged.

      Some variants have more than one RCV because the variant has been reported for distinct disorders. An example is Variation ID: 14318 which has been reported for both Obesity and Schizophrenia.

      ClinVar's current default web display is variation-centric rather than organized by variation-condition combinations.  ClinVar provides an overview comparing these displays.

      I'm interested in a variant that was reported with the condition "not specified". What does that mean?

      Some submitters want to report that a variant is benign with respect to a specific condition, which leaves the possibility that it is clinically relevant for a different condition. Other submitters want to report that the variant is generally benign, in that it does not appear to cause any genetic disorder that should be observable because it is highly penetrant. ClinVar, in collaboration with members of the ClinGen project, requests that submitters provide "not specified" as the condition to indicate that they are not specifying any single condition but rather that the variant is generally benign. Use of this term for this kind of submission may be re-evaluated in the future.

      What does it mean if a ClinVar record is "criteria provided, single submitter" but has more than one submission?

      The review status, such as "classified by single submitter", is based on submissions in which a clinical significance was provided. Some submissions to ClinVar lack an explicit statement of clinical significance. These are included in the number of submissions for a variant, but they do not contribute to the variant's review status, which considers whether or not the variant was classified.

      I think a variant in ClinVar has the wrong clinical significance. What should I do about that?

      The goal of the ClinVar database is to represent the clinical significance values provided by our submitters; therefore, ClinVar staff cannot change the clinical significance that is reported to us. If you think a variant in ClinVar has been classified incorrectly, we encourage you to submit your own interpretation of the variant along with your evidence, such as recent publications. The ClinVar submission wizard can be used to submit a single interpretation with minimal time commitment. Although your submission will not change the classification from other submitters, it will change the overall clinical significance to indicate that there are conflicting reports of pathogenicity and users should look at all the available evidence. ClinVar records with conflicts may also prompt the previous submitters or expert panels to review the variant classification.

      I see multiple dates on a ClinVar record. What do they mean?

      ClinVar records multiple date stamps.

      • In the Clinical assertions tab, the Clinical significance (Last evaluated) column includes the date that the submitter last evaluated the significance of the variant.
      • At the bottom left side of a ClinVar record is a date called "Last Updated". This is the date that ClinVar last updated the record. This includes updates to submissions for the variant as well as updates to data that ClinVar provides, such as links to related resources.

      How can I tell if a record became public on the ClinVar website after the monthly VCF file was created?

      At the bottom left side of a ClinVar record is a date called "Last Updated". If this date is after the date on the VCF file, then the web is displaying data that is newer than data in the VCF file. The website is updated weekly, while the VCF file is created monthly.

       

      When a variant affects multiple genes, what does ClinVar report?

      There are several situations in which a variation may be considered to have a relationship to more than one gene. The reported gene or genes, and the preferred designations,  are selected as follows:

      Submitted as Reported as
      location on a cDNA with or without identifying the gene the gene as submitted or calculated from the cDNA, and preferred name as calculated from the cDNA reference standard for that gene
      genomic location covering multiple non-overlapping genes, with no gene specified all calculated genes in the region based on the most recent NCBI annotation release. The preferred desgination is a genomic HGVS expression without a gene symbol.
      genomic location covering multiple overlapping genes, including those with shared exons all genes are reported , but the preferred description is based on selection of the RefSeq that corresponds to an exonic location. If the variation is in an exon of more than one gene, then the preferred description will not include a gene symbol, and will be based only on the genomic location.

       

      A variant in ClinVar is described as a 1-nt deletion in the genomic DNA, but a 60-nt deletion in the mRNA. Is that an error?

      This variant may represent a 1-nt deletion in genomic DNA that results in exon skipping, and therefore a larger deletion in the mRNA.

      To display data from the Breast Cancer Information Core (BIC), you must be a member and be logged in. Registration is available online.

      Secondly, we have discovered that some URLs to BIC contain spaces, and when ClinVar provides that URL sometimes NHGRI's website translates that space twice. If you are already logged in, the space will not be translated twice and the URL from ClinVar should function as expected.

      The URL https://research.nhgri.nih.gov/projects/bic/Member/cgi-bin/bic_query_result.cgi?table=brca2_exonsnt=995base_change=del%20CAAAT will fail because NHGRI converts to https://research.nhgri.nih.gov/projects/bic/Member/cgi-bin/bic_query_result.cgi?table=brca2_exonsnt=995base_change=del%2520CAAAT. Until we this bug is fixed at NHGRI, you can either

      • Edit the URL back to https://research.nhgri.nih.gov/projects/bic/Member/cgi-bin/bic_query_result.cgi?table=brca2_exonsnt=995base_change=del%20CAAAT
      • Edit the URL to https://research.nhgri.nih.gov/projects/bic/Member/cgi-bin/bic_query_result.cgi?table=brca2_exonsnt=995base_change=del CAAAT

      and do the search. Then the query record at BIC will be displayed.

      A variant in ClinVar is described as "suspect". What does this mean?

      This annotation is imported from dbSNP, and indicates a suspected false positive. See dbSNP's documentation for more information.

      My browser does not display the file documenting Assertion Method. What do I do?

      Settings in some browsers may need to be adjusted to display files provided as pdf. There are several options you can use to display the contents of the file:

      1. Use FireFox.  At the time of this writing, the  .pdf files are displayed with no difficulty.
      2. Download the file, and then use your own tools to display
        1. Right click on the name of the assertion method.
        2. Select Save link as... and define the location to save the file
        3. Use your Acrobat Reader to read the file from directory where you save the file.
      3. Alter the setting for your plugins (Chrome)
        1. Connect to chrome://settings/content
        2. Follow the link to Manage individual plugins
        3. This will open chrome://plugins
        4. Disable Chrome PDF viewer
        5. Enable Adobe Reader
        6. Go back to chrome://settings/content and click on Done
      4. Review compatibility settings (Internet Explorer)
        https://msdn.microsoft.com/en-us/library/dn321449.aspx

      Data Sources and Processing

      What data sources are included in ClinVar?

      From the ClinVar homepage, click on the Statistics link in the navigation bar at the top of the page. This takes you to a summary of ClinVar's submitters.

      Why don't ClinVar records include HGMD identifiers?

      Allele information from HGMD is not publicly available, so ClinVar is unable to connect variants accurately to the appropriate record in HGMD. The files ClinVar previously provided on the ftp site (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar) were removed at HGMD's request.

      Why do some variants in ClinVar have no genomic location?

      ClinVar accepts the description of the variant provided by the submitter. We believe it helps our public to find records, even though the location of the variant on a defined sequence may be uncertain. In some cases, the original assay was at the protein level, and the numbering system for that protein is uncertain or the nucleotide change that may generate the protein change is indeterminate. In other cases, deletions have been reported relative to a transcript, and it is not clear whether the nucleotide change in the genome resulted from a corresponding genomic change, or aberrant splicing generated from another genomic location. Many submissions do have supporting citations, but ClinVar does not have the resources to review the literature to establish the precise nucleotide locations. We welcome submissions that would improve these data for us.

      In other words, ClinVar does assign accessions to submissions that represent human variation descriptively, rather than based on an explicit public nucleotide sequence. In that situation, the search results table shows no nucleotide location for the variant and no links are provided to viewers. If the location of the allele is determined, the record is updated.  This may not require re-submission from the submitter, if NCBI staff are able to establish the location of the variant based on review of the literature.

      Is a new version number assigned for any change in an SCV record?

      Any change that the submitter makes to any SCV record causes the version number to increase. The SCV version number does not increase if there is a change in data that NCBI provides, such as allele frequencies, additional HGVS expressions or a MedGen ID for a condition. Data that NCBI provides are packaged only in the RCV accession, and changes to those data to not cause an RCV version to increment either. A new version is assigned to an RCV if there is a new version of an SCV, or if more SCVs are aggregated into an RCV.

      When there are multiple RefSeqs for a gene, does ClinVar select a subset to use for reporting?

      Yes, ClinVar uses as its default the RefSeq cDNAs that have been selected as reference standards by the RefSeqGene/LRG collaboration. These sequences can be identified within the GenBank view of each cDNA by the comment worded as:

      This sequence is a reference standard in the RefSeqGene project.

      The full set can be retrieved from RefSeqGene's ftp site:
      ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/RefSeqGene/LRG_RefSeqGene

      You can identify all RefSeqs that have been classified as reference standards using these approaches:

      Sample queries
      Goal Approach Comment
      find all RefSeqGene sequences ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/RefSeqGene/ The FTP site provides all sequences, as well as several reports about reference standard cDNAs and matching identifiers in LRG.
      find the reference standard sequences  for a gene "refseqgene standard"[Properties] AND fgfr3[gene] Use the nucleotide database, and apply the property 'refseqgene standard' with the additional qualifier of the gene symbol
      find all reference standard transcripts "refseqgene standard"[Properties] AND biomol_rna[prop] Use the nucleotide database, and apply the property 'refseqgene standard' with the additional qualifier of the sequence type RNA

       

      What is ClinVar's convention for representing the location of variation with length differences when there are multiple options, left or right justified?

      To conform to conventions of HGVS notation, ClinVar will represent the location of the sequence change at the right-most location. The standard for VCF, however, is POS coordinate is based on the leftmost possible position of the variant. For this reason, the location represented by a dbSNP rs number may be left of the location represented by the HGVS notation.

      I included specific ages and geographic origins for individuals in my submission; why are they reported as a range and a larger geographic region?

      Based on concerns of identifiability, when a specific age is submitted for an individual, ClinVar reports the age as the corresponding decade. Similarly, small countries are reported as a larger geographic region, e.g. Costa Rica is publicly reported as Central America.

      Reports

      Why doesn't the VCF file contain all the data in the XML file?

      ClinVar's VCF files are currently limited to variants in ClinVar that have a precise genomic location. Variants with imprecise start and stop, such as exon deletions and CNVs detected by microarray, are not included in ClinVar's VCF files at this time.

      The ClinVar VCF files can be retrieved from ClinVar's ftp site:

      ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/

      Where can I find who has submitted to ClinVar and statistics about number of ClinVar submissions?

      ClinVar reports statistics for the number of submissions, genes, and variants from ClinVar submissions. Total counts and counts per submitter are provided. Note that variation is represented according to the ClinVar data model, in that a variation is represented as a set which may have one or more members. For example, two variations submitted together in cis are members of a single set and are counted in the statistics as one variation.

      On the submitter summary page, why is the number of submissions sometimes different from the number of records from that submitter?

      The count of submissions on the submitter page is a count of the number of submitted interpretations from that submitter. This may include submissions for the same variant but with different conditions. The number of records returned in a search for that submitter is for the number of variants that the submitter has reported. Because the submitter may have reported multiple interpretations for the same variant with different conditions, the number of variants in the search results may be less than the number of submissions on the submitter summary page.

      Other

      How should I refer to a ClinVar record in written reports?

      ClinVar records should be referred to with accession and version numbers. It important to include the version number because the interpretation may change over time; the version number allows you to distinguish between previous and current versions. Note that the RCV accession number refers to the aggregate record for the variant and condition, while the SCV accession number refers to a specific submission for the variant and condition. If you need to build URLs based on a ClinVar accession, please note the instructions for constructing links to ClinVar.

      How should I reference ClinVar?

      An updated description of ClinVar has been published in Nucleic Acids Research.

      Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Jang W, Karapetyan K, Katz K, Liu C, Maddipatla Z, Malheiro A, McDaniel K, Ovetsky M, Riley G, Zhou G, Holmes JB, Kattman BL, Maglott DR. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018 Jan 4. PubMed PMID: 29165669.

      If you wish to reference a specific submission, please cite the SCV accession and version.

      If you wish to reference a specific ClinVar assertion, please cite the RCV accession and version.

Last updated: 2016-07-01T09:54:39-04:00