New SNP Attributes

New attributes have been added to dbSNP to allow searching and filtering of human variation by the following characteristics.

Please contact snp-admin@ncbi.nlm.nih.gov if you have any questions or comments.

Attribute RS Count (Build 135)

Allele Origin:   The rs report summarizes the reported origin(s) of the variant allele asserted by each submitter for the submitted SNP (ss) . Current values are germline, somatic, and unknown.   Additional attributes will be added in the future release to include:

  • not-tested
  • tested-inconclusive
  • other

15313

Clinical significance:   Assertions of clinical significance for alleles of human sequence variations are reported as provided by the submitter and not interpreted by NCBI. Submissions based on processing data from OMIM® were assigned the value of ‘probable-pathogenic’, based on a personal communication from Ada Hamosh, director of OMIM. If there is a published authoritative guideline about the pathogenicity of any allele, that is included in the report.

The supported values are:

  • unknown 
  • untested
  • non-pathogenic
  • probable-non-pathogenic
  • probable-pathogenic
  • pathogenic
  • drug-response
  • histocompatibility
  • other

13105

Global minor allele frequency (MAF):  dbSNP is reporting the minor allele frequency for each rs included in  a default global population. Since this is being provided to distinguish common polymorphism from rare variants, the MAF is actually the second most frequent allele value. In other words, if there are 3 alleles, with frequencies of 0.50, 0.49, and 0.01, the MAF will be reported as 0.49. The current default global population is 1000Genome phase 1 genotype data from 1094 worldwide individuals, released in the May 2011 dataset.

For example, refSNP page for rs222 reports: "MAF/MinorAlleleCount:G=0.249/542". This means that for rs222, minor allele is 'G' and has a frequency of 24.9% in the 1000Genome phase 1 population and that 'G' is observed 542 times in the sample population of 1088 people (or 2176 chromosomes).


16094233

Suspect:  Variation suspected to be false positive due to artifacts of the presence of a paralogous sequence in the genome  (Musumeci et al. 2010) (Sudmant et al. 2010) or evidence suggested sequencing error or computation artifacts. 

288034

Below are examples of the attributes shown on web pages and schema changes.

RefSNP Summary (Example)

 
A. Allele Origin Indicated as Germline or Somatic for each allele
B. Clinical Significance Click on "VarView" or "OMIM" to view phenotype
C. Global MAF The minor allele (G), frequency (0.003), and allele count (3) is shown
D. Suspected A red "?" icon is shown for suspected SNP in the "Validation" row
 

SNP GeneView (Example)

A. Allele Origin Indicated as Germline or Somatic for each allele
B. Clinical Significance Click on icon under "Clinical Channel"  to view effect in Variation Viewer
C. Global MAF MAF is shown for the corresponding allele
D. Suspected A red "?" icon is shown for suspected SNP under the "Validation" column
 

Variation Viewer (Example)

A. Allele Origin Indicated as Germline or Somatic for the variation under "Origin" column
B. Clinical Significance shown under "Clinical Intrepretation" column
C. Global MAF frequency is shown under minor allele frequency (MAF) column
D. Suspected A red "?" icon is shown for suspected SNP under the "Suspect" column
 

RS Docsum (XML Schema)

SNP Attribute XML Element
A. Allele Origin Rs/AlleleOrigin
B. Clinical Significance Rs/Phenotype
C. Global MAF Rs/Frequency
D. Suspected Rs/Validation/@name='suspect'
 

Entrez Search and Eutils Retrieval

The SNP attributes can be search from the web using the field and terms below or selected from the limit page. The search can also be performed programatically using eUtils eSearch. Note: eUtils eFetch is currently being updated to support retrieval of variations with the new attributes. We'll let you know when the update is done.

SNP Attribute Search Field (type) Search Terms
A. Allele Origin ALLELE_ORIGIN (text) somatic
    germline
B. Clinical Significance CLINICAL_SIGNIFICANCE (text) non pathogenic
    other
    pathogenic
    probable non pathogenic
    probable pathogenic
    unknown
C. Global MAF GLOBAL_MAF (floating-point) exact 0.01[GLOBAL_MAF] or range 0.01:0.05[GLOBAL_MAF])
D. Suspected SUSPECTED (text) paralog
 
VCF
The VCF four new tags of the attributes are described in the 000-README file on the FTP site (ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/v4.0/) and in the VCF header.
SNP Attribute VCF Tag
A. Allele Origin ##INFO=<ID=SAO,Number=1,Type=Integer,Description="SNP Allele Origin: 0 - unspecified, 1 - Germline, 2 - Somatic, 3 - Both">
B. Clinical Significance ##INFO=<ID=SCS,Number=1,Type=Integer,Description="SNP Clinical Significance, 0 - unknown, 1 - untested, 2 - non-pathogenic, 3 - probable-non-pathogenic, 4 - probable-pathogenic, 5 - pathogenic, 6 - drug-response, 7 - histocompatibility, 255 - other">
C. Global MAF ##INFO=<ID=GMAF,Number=1,Type=Float,Description="Global Minor Allele Frequency [0, 0.5]; global population is 1000GenomesProject phase 1 genotype data from 629 individuals, released in the 08-04-2010 dataset">
D. Suspected ##INFO=<ID=SSR,Number=1,Type=Integer,Description="SNP Suspect Reason Code, 0 - unspecified, 1 - Paralog, 2 - byEST, 3 - Para_EST, 4 - oldAlign, 5 - other">

 

Filtered VCF Files for polymorphic and clinical variants (Coming Soon).
Filtered VCF data files containing selected subsets of dbSNP to act as surrogates for “polymorphic” or “clinically significant” variants. These subsets are defined using minor allele frequencies (MAF) and sources (http://www.ncbi.nlm.nih.gov/projects/SNP/docs/rs_attributes.html#clinical), and the individual files include the following: 1) MAF > 0.01 based on 1000 Genomes populations in the May 2011 dataset, 2) MAF > 0.01 for any population submitted to dbSNP, 3) MAF > 0.01 for all populations submitted to dbSNP, 4) variations asserted to be clinically significant by submitters and 5) variations suspected to be false positives.