HGVS Expressions at NCBI

NCBI recognizes the standards established by The Human Genome Variation Society ( HGVS ) to describe sequence variation in genomic, RNA, coding DNA, and protein coordinates. This page documents how NCBI's databases process, display, and report HGVS expressions.

Consult the HGVS nomenclature documentation for examples and details about the description of sequence variations and check the validity of your HGVS expression by using the Mutalyzer website .

NCBI resources that support the use of HGVS expressions include:

NCBI Databases:

Database Submissions Data Searches Data Reports
ClinVar
dbSNP
GTR
dbVar
NCBI Tools:

Tool Search by HGVS Expression Calculate of HGVS Expressions from BED, GVF, VCF Calculate Comparable HGVS Expression from RefSeq
1000 Genomes Browser
Clinical Remap
Variation Reporter
Variation Viewer
See the NCBI Variation Tool Page for more information on NBCI resources that analyze variations and allow you to view variation data.

HGVS Expressions at NCBI: Usage

Nucleotides

NCBI accepts only sequence identifiers, CCDS identifiers , or LRG sequence identifiers. "DMD_v1" , "chr1" , or "rs123456" and similar notation is not allowed, nor are "c." coordinates on genomic sequences.

Reference Sequence and Accession Notation

HGVS Expression Will NCBI Process? Comment
CCDS4702.1:c.123C>T Yes An accession and version of a CCDS accession is sufficient to define "c." and "p." coordinate systems.

LRG_391t1:c.1234A>T
LRG_391_t1:c.1234A>T

Yes LRG sequences can be represented as "LRG_1:g." or "LRG_1_tn:c.", where the n after the t represents the transcript identifier.
CAV3:c.123T>C No We have decided not to process HGVS expressions by gene symbol. Although this expression is properly formed, there are too many cases where splice variants affect the numbering system. We shall not guess the intent of the submitter, even if the NCBI databases represent only one cDNA for a given gene.
c.123T>C No The expression must provide the reference sequence on which the numbering system is defined. Ideally, the expression should contain an NCBI RefSeq accession and version, e. g. NM_001234.3:c.123T>C
uc002rvz.2:c.2744A>G No NCBI will not accept HGVS expressions constructed with UCSC transcript identifiers.

Substitutions: Allele Change Notation

HGVS Expression Will NCBI Process? Comment
LRG_287t1:c.1071G>? No

Use G>N to indicate a change to an alternative allele that has an unsure value.

NM_007294.3:c.2935C > T
NM_007294.3:c.2935 C>T
No Spaces are not allowed.
NM_007294.3:c.2935T>C No The reference nucleotide must be correct.
NM_007294.3:c.2935C No Complete expressions are required to represent an allele. To represent the reference sequence, see "Alleles Found in Reference Sequence" below.

NM_007294.3:c.2935C-T NM_007294.3:c.2935C/T
NM_007294.3:c.C2935T

No Nucleotide alleles must be represented by the > symbol, not by -, /, 'to' or by reporting the nucleotide location between the reference and altered nucleotides.

NM_007294.3:c.2935CtoT
NM_007294.3:c.2935C to T
NM_007294.3:c.2935C to T

No Nucleotide alleles must be represented by the > symbol, not by -, /, 'to' or by reporting the nucleotide location between the reference and altered nucleotides.

Substitutions: Introns

HGVS Expression Will NCBI Process? Comment
NM_172107.2:c.1631+1G>A Yes Processing of nucleotides is not case-sensitive, and the nucleotide asserted as reference is found at that location.
NM_172107.2:c.1629+1G>A No If intronic, the position of the exon boundary must be correct (NM_172107.2:c.1631+1G>A)
NM_172107.2:c.-200C>T Yes, but note... Rather than interpreting this expression as NM_172107.2:c.1-200, 200 nt upstream of the initiation site on the genome, NCBI interprets this expression as NM_172107.2:c.-177-23, indicating that the variant is 23 nt upstream of the beginning of the transcript, which, in turn, is 177 nt upstream of the translation initiation site. The location on the genome is therefore affected by the presence of indels or introns upstream of translation initiation in the transcript.Note:positions that extend before the start or past the end of the reference sequence are allowed only in "c." coordinates. (NM_172107.2:c.-200C>T)
NG_009822.1:c.1437+1G>A No Although valid according to HGVS, NCBI does not process c. coordinates on NG accessions unless the NG is also anLRG.

Substitutions: Exons and UTRs

HGVS Expression Will NCBI Process? Comment
NM_172107.2:c.3300G>A Yes, but note... NCBI interprets this expression as NM_172107.2:c.*455+226 rather than NM_172107.2:c.1+3299, indicating that the variant is 226 nt downstream of the end of the transcript, which, in turn, is 455 nt downstream of the last nucleotide (2619) of the termination codon.Note:positions that extend before the start or past the end of the sequence are allowed only in "c." coordinates. (NM_172107.2:c.3300G>A)
NM_007294.3:c.*1287C>T Yes This expression is a representation of a variation in the 3' UTR.
Note: Although we accept the outside exon HGVS expression shown above, we strongly encourage the use of NG RefSeqs in HGVS expressions for exonic expressions.

Insertions/Deletions

HGVS Expression Will NCBI Process? Comment

NM_005359.5:c.1_2AT>ACGT

Yes Interpreted as NM_005359.5:c.1_2delATinsACGT.

NM_005359.5:c.189_190insGTG
NM_005359.5:c.189_190insAF301222.1c.1_44

Yes If the inserted sequence is 15 bp or less, the inserted nucleotides are included in the HGVS expression. If the insertion is longer than 15 bp, the inserted sequence should be included as a reference to anINSDCrecord.
NM_005359.5:c.189_197delAAATGGAGC
NM_005359.5:c.1_197del197
Yes If the deleted sequence is 15 bp or less, the deleted nucleotides are included in the HGVS expression. If the deletion is longer than 15 bp, the HGVS expression shows the length of the deletion but does not show each nucleotide.
NM_005359.5:c.189_197delAAATGGAGCins44* No This expression is an incomplete description of the variant since it does not provide a reference to theINSDCrecord for the inserted sequence.
* Support for insNNN is resource specific. Currently ClinVar fully supports HGVS expressions containing insNNN and dbVar allows submissions and FTP downloads of insNNN data.

Note: Additional examples of HGVS expressions representing insertion/deletion variants can be found on the Variation Reporter's Variation Format Example page .

Repeats

HGVS Expression Will NCBI Process? Comment
NG_016832.1:g.5027CGG(8_14) Yes For short repeats, the HGVS expression should include the first nucleotide position of the repeat, the repeat sequence and the range indicating the number of copies of the repeat in parentheses and using an underscore, not a dash.
NG_016832.1:g.5027CGG(8-14) No The range indicating the number of copies of a repeat should use an underscore, not a dash.

Alleles Found in Reference Sequence

HGVS Expression Will NCBI Process? Comment
NM_020485.4:c.676G=
NM_020485.4:c.676G>G
NM_020485.4:c.676=
Yes These expressions represent an allele found in the reference sequence.
NM_020485.4:c.676G No This expression cannot be used to represent the allele found in the reference sequence because it may also be an incomplete HGVS expression intended to represent a change.

Proteins

NCBI accepts only sequence identifiers, CCDS identifiers or LRG identifiers. The sequence for LRG should be "LRG_1_pn:p.", where the "n" in "pn" is the protein identifier.

HGVS Expression Will NCBI Process? Comment
NP_009225.1:p.Cys64Gly
NP_009225.1:p.CYS64GLY
NP_009225.1:p.C64G
LRG_292p1:p.Cys64Gly
Yes NCBI will process expressions where the amino acids are represented in 3-letter codes with the first or all letters capitalized, or in capitalized single letter codes.
NP_000585.2:p.Ile40STOP No NCBI will process expressions containing "Ter", "X", and * to indicate a change resulting in a nonsense codon, but we will not process expressions containing the word "STOP".
NP_009225.1:p.cys64gly
NP_009225.1:p.c64g
No NCBI will not process expressions where proteins are represented completely in lower case.

HGVS Expressions at NCBI: Exceptions

Use of Parentheses

Although parentheses are used in HGVS expressions to indicate predicted expressions (uncertainty), NCBI has decided to not include parentheses in the HGVS expressions that we generate and report (particularly for proteins) so as to make our HGVS expressions easier to read. However, we will accept and display without conversion any expressions submitted to us where uncertainty is indicated by parentheses.

Note : ClinVar will report HGVS expressions with parentheses () if they are provided in submitted r. expressions [i.e. r.(?)]

Reporting Protein Positions

NCBI reports the projected position on protein coordinates so that users know the general location of the deletion's effect on a protein sequence identified by accession.version. This is relevant for deletions that are upstream of a gene, or include part of an intron or splice site. However, when projecting a deletion from genomic coordinates to protein coordinates, the actual effect on the protein may not be predicted. NCBI will only report a protein location if:

  • the genomic location of the variant is within (or no more than 5kb beyond) the span of the RNA and
  • the variation maps within the coding sequence on the RNA

NCBI Use of Selected HGVS Expressions

For some types of variation, the HGVS nomenclature standards allows several representation options. In the case of insertions and deletions, NCBI has adopted different representations for HGVS expressions depending on the size of the insertion or deletion to be described.

Deletions

If a deleted sequence is 15 bp or less, the deleted nucleotides are included in the HGVS expression. If a deletion is longer than 15 bp, the HGVS expression shows the length of the deletion but does not show each nucleotide. NCBI’s selection of the cut-off point where the change to the HGVS expression occurs was designed to maximize HGVS expression utility and ease of data display.

To see examples of the HGVS expressions that NCBI accepts for insertions and deletions that are < 15 bp or > 16, see the " Insertions/Deletions " portion of the "HGVS Expressions as Input for NCBI Resources" section of this document.

Insertions

If an inserted sequence is 15 bp or less, the inserted nucleotides are included in the HGVS expression. If an insertion is longer than 15 bp, the inserted sequence is represented as a reference to an INSDC record. NCBI’s selection of the cut-off point where the change to the HGVS expression occurs was was designed to maximize HGVS expression utility and ease of data display.

To see examples of the HGVS expressions that NCBI accepts for insertions and deletions that are < 15 bp or > 16, see the " Insertions/Deletions " portion of the "HGVS Expressions as Input for NCBI Resources" section of this document.

insNNN

NCBI support for submissions that specify insertion length but do not specify insertion sequence (insNNN) is resource specific. Currently Clinvar supports the use of insNNN in the context of HGVS expressions and dbVar has limited support for insNNN in the form of submissions and FTP access to insertion length and sequence (if available).

NCBI Use of Gene Symbols

NCBI has decided not to process HGVS expressions by gene symbol since there are too many cases where splice variants affect the numbering system. We shall not guess the intent of the submitter, even if the NCBI databases represent only one cDNA for a given gene.

Support Center

Last updated: 2017-11-16T17:59:39Z