Towards robust clinical genome interpretation: developing a consistent terminology to characterize disease-gene relationships - allelic requirement, inheritance modes and disease mechanisms

PURPOSE: The terminology used for gene-disease curation and variant annotation to describe inheritance, allelic requirement, and both sequence and functional consequences of a variant is currently not standardized. There is considerable discrepancy in the literature and across clinical variant reporting in the derivation and application of terms. Here we standardize the terminology for the characterization of disease-gene relationships to facilitate harmonized global curation, and to support variant classification within the ACMG/AMP framework. METHODS: Terminology for inheritance, allelic requirement, and both structural and functional consequences of a variant used by Gene Curation Coalition (GenCC) members and partner organizations was collated and reviewed. Harmonized terminology with definitions and use examples was created, reviewed, and validated. RESULTS: We present a standardized terminology to describe gene-disease relationships, and to support variant annotation. We demonstrate application of the terminology for classification of variation in the ACMG SF 2.0 genes recommended for reporting of secondary findings. Consensus terms were agreed and formalized in both sequence ontology (SO) and human phenotype ontology (HPO) ontologies. GenCC member groups intend to use or map to these terms in their respective resources. CONCLUSION: The terminology standardization presented here will improve harmonization, facilitate the pooling of curation datasets across international curation efforts and, in turn, improve consistency in variant classification and genetic test interpretation.


Supplementary
Note not all founder members had separate allelic requirement terms in addition to inheritance terms. Abbreviations: PAR -pseudoautosomal region.

G2P -Gene2Phenotype
Proposal • An X-linked dominant condition would be curated as monoallelic_X_het and we would understand that those diseases manifest when het or hem (or indeed hom/compound het -though this may be more severe or lethal). • An X-linked recessive condition would be curated as monoallelic_X_hem and would not manifest when heterozygous (though they can manifest with ameliorated phenotype, or manifest if skewed inactivation etc -we intend that this is implicit in the term, as characteristic of many sex-linked disorders, and do not anticipate that an additional modifier term is needed to communicate this, unless the het phenotype is sufficiently distinct as to be classified as a different disease entity) • Terms are specific to each disease-gene pair, so for example if there is good evidence for manifesting carriers of X_Hem disorders presenting in infancy/early childhood that would be coded as X_Hem in DDG2P. If carriers only have late onset cardiomyopathy (e.g. female carrier of DMD) they would be X_het in the Cardiac panel but not DD. • Mosaic is intended for conditions that are typically lethal when constitutive • Imprinted requires that the abnormal allele be paternal or maternal in origin. • Requires heterozygosity covers edge cases such as Craniofrontonasal dysplasia due to EFNB1which requires heterozygosity and would not manifest (fully) if hemizygous. Importantly the mutant allele can be inherited from a normal or very mildly affected father.

Disease associated variant consequences:
High level terms to describe variant consequences: • Dose Change -Dose reduction -Decreased gene product level -Absent gene product -Increased gene product level • Altered gene product sequence Notes: • Dose reduction -for example PTCs (protein truncating), gene-disrupting SVs, and gene-deletions (assuming NMD-competent PTC, and with caveats about splicing) • Increased gene product -for example non-disruptive gene duplications, some promoter or enhancer variants • Altered gene product sequence -for example NMD-incompetent PTCs, other length-changing variants (inframe indels, stop loss), and missense.

Supplementary information 2. Using a framework of standardised terminologies to define inheritance, allelic requirement, disease-associated variant classes, and disease-associated variant consequence for gene-disease pairs.
This document provides a template and guidance for the curation of inheritance, allelic requirement and disease-associated variant consequences for gene-disease pairs already curated by ClinGen using standardised terminology.

Review of evidence:
ClinGen: Include summary of evidence to support gene-disease relationship. This can be found at https://clinicalgenome.org/ and searching for the specific gene. Paste web link for ClinGen evidence summary page here e.g. for KCNQ1 http://search.clinicalgenome.org/kb/genes/HGNC:6294.
If the summary page is not on the ClinGen website, these can sometimes be found in the Supplementary data of the ClinGen curation paper.

Review of source material:
Review and include the reference (PMID) for the relevant ClinGen gene-disease validity paper e.g. for hypertrophic cardiomyopathy PMID: 30681346. This is likely to include useful summary information and publications to refer to.

Other Literature review:
This is to gather new information, not to re-evaluate the gene-disease relationship. Evidence is collected primarily from published peer-reviewed literature, but can also be present in publicly accessible resources, such as variant databases. Up-to-date reviews from centres with particular expertise in a given gene or disease are particularly helpful.
Useful publication search engines include: PubMed Google Scholar LitVar GeneCards Mastermind Other useful information GeneReviews and the "Molecular Genetics" section OMIM ClinVar to search for relevant variant classes PanelApp (If using a resource like PanelApp need to reference the assertion and check original references) As these gene-disease pairs have all been classified as "Definitive" or "Strong" by ClinGen, they are well established and there may be abundant information. The goal is not to re-evaluate the genedisease validity, and the literature review therefore does not have to be exhaustive. The literature search should be focused on establishing inheritance pattern, allelic requirement and where possible disease-associated variant class and functional consequences.
For example, for some gene-disease pairs it may be well established that the pattern of inheritance is autosomal dominant but there may be a small number of reports of recessive inheritance. A broad search of the literature can determine if other modes of inheritance have been reported, using search terms: Gene AND disease AND ("recessive" OR "autosomal recessive" OR "homozyg*" OR "compound heterozyg*" OR "biallelic") Gene AND disease AND (dominant OR "autosomal dominant" OR monoallelic OR heterozyg*) Gene AND disease AND ("x-linked" OR "x linked" OR "X chromosome" OR "X linked dominant" OR "X linked recessive") Any reports of a different mode of inheritance should be reviewed to see if they are relevant or not. For many genes, a second hit may lead to a more severe phenotype but that does not necessarily mean the inheritance follows a recessive or digenic pattern as both the first and second hit would in fact cause disease in isolation.
For disease-associated variant consequence and mechanism, literature review should focus on establishing the most likely consequences. Curators should review the evidence for haploinsufficiency in the ClinGen Dosage Sensitivity curation (http://search.clinicalgenome.org/kb/genedosage?page=1&size=25&order=asc&sort=symbol&search=), pathogenic/likely pathogenic variant classes on ClinVar (Simple ClinVar can be a helpful tool to search ClinVar, see screen shot below and link http://simple-clinvar.broadinstitute.org) and other public variant databases where available. For well described genes, recent publications re-evaluating variants, expert reviews, meta-analyses, and reviews of burden testing are highly relevant.
It is not necessary to review every variant. However, if for 12example the predominant class of variant is missense but there are a small number of nonsense mutations reported, extra time should be spent determining whether there is sufficient evidence to include these as a pathogenic variant class before expanding the disease mechanism. Sufficient evidence could include segregation or functional evidence.
If high level reviews are not available for a gene-disease pair, then a broad literature search may be necessary e.g. Gene AND disease AND (variant OR mutation). For a variant class to be included that would add to the predicted functional consequence, there should be sufficient qualitative evidence to support that such as segregation, functional or burden data.
Where there is uncertainty that cannot be resolved a note should be made in the narrative summary. Include PMIDs where possible and or links to other resources.

Inheritance and Allelic Requirement
List inheritance and allelic requirement terms and any number of appropriate modifier terms.
Use of modifiers enables recording of data important to reproductive advice and family screening.

Harmonised allelic requirement and Mendelian inheritance terms.
Abbreviations: HPO -Human phenotype ontology, PAR -pseudoautosomal region. Inheritance modifier terms-these optional terms can be combined with either inheritance terms or allelic requirement terms to provide additional information about the relationship of a diseasegene pair.

Definition (Parent term)
Typically mosaic Description of conditions in which, for example, constitutive mutation is lethal, and cases are exclusively or predominantly mosaic. A much lower variant allele fraction (VAF) cut-off would be needed in analysis pipelines.
Typically de novo HP:0025352 Description of conditions that are exclusively or predominantly observed due to de novo variants. In some cases, this may be due to the limited reproductive fitness of affected individuals. Description of conditions in which all individuals with a given genotype exhibit the disease within a lifespan of 80 years. For example, penetrance of Neurofibromatosis type 1 due to NF1 is close to 100%.
Variable age of onset Description of conditions in which age of onset is highly variable and in which manifestation of the disease phenotype is not dependent on the age of the subject.
Typified by age-related onset HP:0003831 -Typically infantile onset -Typically childhood onset -Typically adult onset Description of conditions in which age of onset is typically not congenital and in which manifestation of the disease phenotype is dependent on the age of the subject.

Congenital onset
Description of conditions which are manifest at or before birth, for example cleft lip or talipes.
Imprinted HP:0034338 -With maternal imprinting HP:0012275 -With paternal imprinting -HP:0012274 Requires that the abnormal allele be paternal or maternal in origin, depending on the disease-gene relationship. Imprinting refers to a normal developmental process in which either the paternal or maternal allele is inactivated, depending on the specific locus, thus leading to expression from only one copy of the gene. Disease typically manifests when a deleterious variant is inherited from a parent whose copy of the gene would normally be expressed, but not when a deleterious variant is inherited from a parent whose copy of the gene would normally be inactivated.

Displays anticipation HP:0003743
A phenomenon in which the severity of a disorder increases, or the age of onset decreases, as the disorder is passed from one generation to the next, typically due to expansion of a repeat sequence. For example, Myotonic Dystrophy is caused by triplet repeat expansion in the DMPK gene.

Requires heterozygosity HP:0034343
Covers rare instances of a condition that is most severe in the heterozygous state. Such disorders are rare and currently all are X-linked. Most X-linked recessive conditions manifest if hemizygous in males, or biallelic in females, though may have a mild phenotype in the heterozygous state in females. However, Craniofrontonasal dysplasia due to EFNB1, and PCDH19-related epilepsy, are both X-linked dominant and paradoxically more severe in females. Hemizygous males may be mildly affected but seldom manifest the full phenotype. Importantly the mutant allele can be inherited from a normal or very mildly affected father. The mechanism is currently accepted to be due to cellular interference whereby the two distinct cell populations (those with and without the variant) exhibit abnormal cellular interactions in the mosaic state -in women, who are functionally mosaic due to random X inactivation, or mosaic males. The same mechanism could theoretically be observed in autosomal genes with a mosaic variant.

Sex-limited expression HP:0001470
Condition in which the phenotype only manifests in one sex, i.e. either manifests in males or females but not both. Example: Autosomal recessive sex reversal due -Male-limited expression HP:0001475 -Female-limited expression -HP:0034344 to DHH on chr12 manifests only in XY males causing gonadal dysgenesis, while XX females are phenotypically normal.

Contiguous gene syndrome HP:0001466
Syndrome caused by the effects of abnormality (typically a deletion or duplication) of 2 or more adjacent genes.

Notes:
• Mitochondrial -the inheritance of a trait encoded in the mitochondrial genome. Persons with mitochondrial disease may be male or female but the mode of inheritance is strictly maternal. No male with the disease can transmit it to their offspring. • PAR -genes within the pseudoautosomal regions (PAR) are inherited like autosomal genes. PAR1 comprises 2.6mb of the short-arm of both X and Y chromosomes in humans. PAR2 is at the tip of the long arms, spanning 320kb. Normal male mammals have two copies of these genes: one in the pseudoautosomal region of their Y chromosome, the other in the corresponding portion of their X chromosome. Normal females also possess two copies of pseudoautosomal genes, as each of their two X chromosomes contains a pseudoautosomal region. Crossing over between the X and Y chromosomes is normally restricted to the pseudoautosomal regions; thus, pseudoautosomal genes exhibit an autosomal, rather than sexlinked, pattern of inheritance. So, females can inherit an allele originally present on the Y chromosome of their father.
• For monoallelic_X_het (X-linked dominant) conditions, we would understand that those diseases manifest when het or hem (or indeed hom/compound het -though this may be more severe or lethal). • For monoallelic_X_hem (X-linked recessive) conditions, we would understand that these would not manifest when heterozygous (though they can manifest with ameliorated phenotype, or manifest if skewed inactivation etc -primarily recessive with milder female expression) • Terms are specific to each disease gene pair. Considering a hypothetical example of a gene on the X-chromosome in which biallelic or hemizygous monoallelic variation causes congenital structural heart abnormalities, but a heterozygous monoallelic variant typically presents with late onset cardiomyopathy, this might be coded as monoallelic_X_hemizygous for congenital heart disease, and appropriate filtering applied in a developmental disorders panel for diagnosis of an infant, and monoallelic_X_heterozygous (age-related onset) for cardiomyopathy, with different variant filtering applied for a cardiac gene panel analysis in an adult. This has the advantage of tracing the evidence for each disease association.

List variant classes (SO terms) in this gene proven to cause this disease:
Consider whether the disease is associated with: -missense & in frame variants -Protein terminating codon (PTCs) (premature truncating variants (PTV)) or loss of function (LoF) or radical variants for PTCs need to consider whether nonsense mediated decay (NMD) competent or not In practice it is useful to know whether a gene-disease pair is associated with missense only, truncating only, or both) See matrix below for variant classes and SO terms List other variant classes predicted to lead to the same functional consequence: Other variant classes that could be predicted to lead to the same functional consequence based on inferred mechanism (score 4 or 5, see matrix below) and therefore might cause the same phenotype.
Matrix of six new high-level predicted functional consequences mapped to SO structural consequence terms via a semi-quantitative scale indicating likelihood of each high-level consequence The semi-quantitative scale is characterized from first principles by expert evaluation.

Disease-associated variant consequences:
Once the variant classes associated with the disease are known, map these to the high-level terms using the matrix above. High level terms to describe variant consequences: Altered gene product level -A sequence variant that alters the level or amount of gene product produced. This high-level term can be applied where the direction of level change (increased vs decreased gene product level) is unknown or not confirmed e.g. promoter or enhancer variants, some splice variants Increased gene product level -a variant that increases the level or amount of gene product produced e.g. non-disruptive gene duplications, some promoter or enhancer variants Decreased gene product level -a sequence variant that decreases the level or amount of gene product produced e.g. a 5'UTR variant that reduced protein levels by disrupting translation, a 3'UTR variant that affects RNA stability, splice variants that decrease but do not stop expression, variants leading to nonsense mediated decay (NMD)-competent premature termination codon (PTCs), or gene-disrupting structural variants. Absent gene product -a sequence variant that results in no gene product. e.g. whole gene or other large scale disruptive structural variant, variants producing NMDcompetent PTCs Altered gene product sequence -a sequence variant that alters the sequence of a gene product. General notes: -If monoallelic and biallelic inheritance can cause the same disease, they should be recorded as separate entities if biallelic variants lead to a different phenotype (not just a change in severity) -If a dominant variant can also be seen on both alleles but the outcome is essentially the same disease, then this should be categorised as one entity using dominant and monoallelic.
For example: AD and AR DSC2 causing isolated ARVC are one disease gene pair AR DSC2 causing ARVC with cutaneous manifestations is a separate disease gene pair.