How ClinVar validates submissions

Validation during submission processing

  1. Variant
  2. Condition
  3. Variant-gene
  4. Variant-condition
  5. Miscellaneous
  6. Interpretation

Information reported after submission processing

Validation during submission processing

ClinVar analyzes the content of submissions and validates selected elements.  This analysis includes both automated checks and manual checks by curators. Some checks result in rejecting a submission; others allow the submission to proceed but with questioned information returned to the submitter for review. 

Variant

Variant definition

A variant defined as a sequence change relative to a reference sequence is analyzed to verify that the reference allele occurs at the position reported on the reference sequence. This check is implemented for submissions based on HGVS expressions and on explicit chromosomal positions. If the definition of the variant cannot be validated, the submission is rejected and submitter is asked to review. 

  • If a submitter declines to correct an variant description that ClinVar cannot validate, it may be processed as "non-validated".
  • ClinVar does have some older submissions based on HGVS expressions that were not tested as rigorously, so the submission is public. For those submissions the HGVS expression is reported as non-validated. ClinVar is working with submitters to correct those records.

HGVS

  • ClinVar validates the format of HGVS expressions that are submitted. Some less frequently used HGVS standards may be not be included in our validation code; thus expressions that we cannot validate are called "non-validated", rather than "invalid".
  • NCBI is testing code that compares different HGVS expressions to determine if they describe the same allele. ClinVar does identify some duplicates from one submitter when mapping submissions based on different transcript or RefSeqGene accessions to a chromosomal location, and rejects the duplicate. However, we recognize that we still do not identify all duplicates, particularly those based on different HGVS expressions in regions of repeat sequence, or when HGVS expressions are provided as insertions or indels when the HGVS standard requires that they be duplications or inversions, respectively.

Consistency checking

  • If identifiers are provided for a variant, e.g. an rs number, ClinVar verifies that the genomic location for that identifier is consistent the variant description. 
  • Legacy descriptions are not verified.
  • If a submission is updated, ClinVar stops the processing if a change in the variant definition is detected.  The update is allowed to continue only if the submitter verifies the previous definition was an error.  ClinVar then assigns a new AlleleID and a new VariationID, when it updates the version of the SCV accession.

Multiple variants

  • If multiple variants are submitted together as a single record, e.g. as a haplotype or as a compound heterozygote, curators confirm the submitter's intent. This may result in splitting a submission, e.g. two variants submitted as a compound heterozygote as pathogenic for an autosomal recessive disease should be split into two distinct submissions, reporting each individual variant as pathogenic for the disease.

Condition

  • If a submission defines a disease or phenotype with a database identifier, ClinVar validates:
    • that the identifier is valid for the database
    • that the identifier is for a disease or phenotype. For example, a MIM number for a disease is valid; a MIM number for a gene is not.
  • If a submission defines a disease or phenotype with both a database identifier and a name, ClinVar validates that the identifier and name represent the same concept.
  • If multiple diseases or phenotypes are submitted as a single record, curators confirm the submitter's intent. e.g. a variant that is reported as pathogenic for two diseases on a single record indicates that the variant results in the combination of those diseases, not that the variant can cause either disease. For the latter case, the submission is split into two distinct submissions, one for the variant with each of the diseases.

Variant-gene

  • If a submitter specifies the gene affected by a variant, ClinVar verifies that the genomic location of that variant is within that gene. (in progress)

Variant-condition

  • ClinVar is adding a check to determine whether, when a MIM number submitted for a disorder and that disorder is specific to a causative gene, the variant's location is consistent with the location in a gene.
  • ClinVar checks whether the submitter has already submitted an intepretation for the variant and condition. We are aware that the database contains some duplicate records that were submitted before checks were put in place. We are working with submitters to resolve the duplicates.

Evidence

ClinVar validates:

Miscellaneous

  • For an update to an SCV accession, ClinVar verifies that the submitter is the owner of that record.
  • If a batch of submissions includes only variants of "uncertain significance", curators confirm that the variants were determined to be uncertain as the result of an interpretation process. If the variants are uncertain because they were not interpreted, they are not in scope for ClinVar.

Interpretation

  • ClinVar does not validate the interpretation for a variant.
  • ClinVar does not determine which interpretation is correct when submitters disagree.
    • ClinVar represents the interpretation provided by each submitter.
    • ClinVar calculates an aggregate clinical significance based on submissions and indicates when there is conflict between submitters.
    • Submitters are encouraged to provide evidence for the interpretation so that users can understand why submitters may disagree with the interpretation.
    • A curated interpretation from an expert panel or a practice guideline overrules any conflict from other submitters.
  • ClinVar does not review criteria for interpretation used by submitters (assertion criteria).
    • The submitter may provide documentation of the categories used to classify variants and the criteria needed to categorize variants into each bin.
    • ClinVar staff may review this documentation to ensure that it describes categories and criteria, but they do not decide whether the categories and criteria are appropriate.
    • This documentation of assertion criteria is for users to evaluate how an interpretation was made and may help users understand why submitters disagree in their interpretation.

Information reported after submission processing

After submission, a report is provided to the submitter based on checks done when submitted data is integrated into the database. This report is provided only as an FYI for the submitter, and it includes information such as:

  • ClinVar processed a variant description that could not be validated
  • the submitted HGVS expression uses a previous version of the reference sequence
  • the interpretation is inconsistent with the allele frequency (e.g. a pathogenic variant with a high allele frequency in GO-ESP, 1000 Genomes, or ExAC)
  • the interpretation was made for a novel gene-disease relationship
  • ClinVar has conflicting submissions for the same variant-disease relationship (ClinVar checks for this issue proactively but this check addresses historical issues with redundant records)
  • the variant is flagged in dbSNP as suspect
  • the submitter’s interpretation conflicts with the interpretation from an expert panel or practice guideline
  • The submitter’s interpretation differs from another submitted interpretation
  • The interpreted disease is idiopathic
  • The interpretation is “pathogenic” but no disease was provided
  • The interpretation changed but the date last evaluated did not change

Last updated: 2017-01-23T13:31:07-05:00