Skip to main page content
Accesskeys

Documentation

Introduction

The clear majority of genome wide association studies (GWAS) associate human diseases with some variants residing outside the coding DNA. Only a small fraction of these GWAS-associated noncoding variants are causal, many are in regions of high linkage disequilibrium hosting either a causal coding variant or a causal noncoding variant of a gene regulatory element. Identifying causal noncoding variants has always been a challenging task due to a limited number of methods and tools accurately quantifying the impact of a noncoding mutation.

A growing body of work has been devoted to the quantification of deleterious effects of noncoding mutations using artificial intelligence and deep learning methods. These methods include DeepSEA, DeepBind, and Basset that ‘deep learn’ regulatory sequence code from big genomics data; deltaSVM and deSNPs that learn sequence features from a single enhancer-associated chromatin profile and consider the k-mer content associated with the genetic variant only, CATO that predicts chromatin states by using high-throughput sequencing data across multiple individuals; C-SCORE that integrates multiple annotations into one metric by contrasting variants that survived natural selection with simulated mutations; and CAPE that decomposes the sequence code of potential binding sites and the binding sites of cofactors from a set of chromatin profiles, and directly quantifies the deactivating effect of a single nucleotide mutation based on the corresponding change in the underlying k-mer profile. In our recent study published in Nucleic Acids Research, we compared our method CAPE to several other tools and observed differential accuracy of the various methods in predicting dsQTLs and eQTLs. The accuracy of causal variant prediction varied considerably from no differentiation to >90% depending on a selected method and/or a particular cell line. Although, there are methods that generally outperform others, there is no single method that performs the best in every scenario.

SNPDelScore offers pre-compute deleterious effects of noncoding variants using a large panel of currently available methods and summarize this information in an interactive, easy to use website. We are also providing open access to these data through a RESTfull based web service available through this website. Additionally, a Python based web services command line client is available and it can be used to retrieve the data from other applications and tools.

GWAS

The GWAS Catalog was downloaded from EBI-GWAS.
The version included into the database was: gwas_catalog_v1.0-associations_e88_r2017-04-03.tsv

TFBSs

The TFBSs were created using the program tfbsFrag.
TSV files for each chromosome where created and are available here.

The TSV file format is:

#PWMs	        START	END	STRAND	SEQUENCE		CHROM
UP00109_1	11456	11470	+	ACTGGCGGATTATAG		1
UP00176_1	11456	11471	+	ACTGGCGGATTATAGG	1
M1053_1.02	11459	11468	-	ATAATCCGCC		1
M5501_1.02	11460	11469	-	TATAATCCGC		1
DMBX1_DBD	11460	11469	+	GCGGATTATA		1
DPRX_DBD_1	11460	11469	+	GCGGATTATA		1
M5346_1.02	11460	11469	-	TATAATCCGC		1
        

An additional file was used to transform PWMs to Gene name.

Web Services

The web services are available to retrieve the data in JSON or HTML formats from a client based in a RESTful service

RESTful URLs

Retrieve all SNP data

  1. Search SNP by ID, HTML format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snp/?name=rs7417106
  2. Search SNP by ID, JSON format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snp/?name=rs7417106&format=json

Retrieve calculated data

  1. Search SNP by ID, HTML format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?name=rs7417106
  2. Search SNP by ID, JSON format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?name=rs7417106&format=json
  3. Search SNP by Gene name, HTML format: https:www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?gene_name=DDX11L1
  4. Search SNP by Gene name, JSON format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?gene_name=DDX11L1&format=json
  5. Search SNPs by chromosome, HTML format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?chr=chr1
  6. Search SNPs by chromosome, JSON format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?chr=chr1&format=json
  7. Search SNPs by chromosome and region, HTML format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?chr=chr1&start=0&end=3369847
  8. Search SNPs by chromosome and region, JSON format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?chr=chr1&start=0&end=3369847&format=json
  9. Search SNPs by method: CAPE eQTL (use method's number as shown in Methods), HTML format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?method=1
  10. Search SNPs by method: CAPE eQTL (use method's number as shown in Methods), JSON format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?method=1&format=json
  11. Search SNPs by tissue: GM12878 Lymphoblastoid Cells (use tissue's number as shown in Tissues), HTML format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?tissue=33
  12. Search SNPs by tissue: GM12878 Lymphoblastoid Cells (use tissue's number as shown in Tissues), JSON format: https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?tissue=33&format=json

Results in JSON format

The results in JSON format for the calculated data follows the next syntax:

{
    "count": 4059,
    "next": "https://www.ncbi.nlm.nih.gov/api/snpdata/?page=2",
    "previous": null,
    "results": [
        {
            "name": "rs7417106",
            "pos": 911595,
            "ref": "A",
            "alt": "G",
            "chr": "chr1",
            "method": "CAPE eQTL",
            "tissue": "GM12878 Lymphoblastoid Cells",
            "value": 0.00483957
    ]
}

Global Fields

  • count: total number of SNPs retrieved
  • next: URL to be used to retrieve the next set of results
  • previous: URL to be used to retrieve the previous set of results
  • results: List of results

SNP Fields

  • name: SNP ID
  • pos: SNP position
  • ref: SNP reference
  • alt: SNP alternative
  • chr: Chromosome
  • method: Method used to calculate the prediction
  • tissue: Tissue used
  • value: Prediction value

Combining options

All options can be used together to retrieve the specific data. For example, to retrieve the list of SNPs in the region: chr5:5689-758812640 calculated with the method CAPE eQTL

https://www.ncbi.nlm.nih.gov/research/snpdelscore/api/snpdata/?chr=chr5&start=5689&end=758812640&&method=1

Web Services Client

The Web Services can be accessed through any external application that can query URLs and parse the JSON output.

Python Client

A Python (version 3) script was developed to act as a client for the Web Services. The script can be used to query the RESTfull API available through this web application.

The script can be downloaded from here and use command line options to retrieve the data.

Options used to query the RESTfull API

    -i      Input file. Each line can be snp name, gene name or genome coordinates
    -n      Search by SNP ID
    -g      Search by Gene name
    -c      Search by chromosome name and region. Format: chr1 or chr1:pos or chr1:start-end
    -m      Search by method used
    -t      Search by tissue used
    -b      Print output in BED format

Example 1

Search SNPs in chromosome 2 calculated with CAPE dsQTL and print the output in BED format.

Command line:

#> python snp_rest_client.py -b -m 3 -c chr1:10500-15100

Output:

chr1 13116 rs62635286  T G	1.632919 # Method: deltaSVM, Tissue: Average
chr1 11012 rs544419019 C G	1.357423 # Method: deltaSVM, Tissue: Average
chr1 11012 rs544419019 C G	3.137345 # Method: deltaSVM, Tissue: GM12878 Lymphoblastoid Cells
chr1 13116 rs62635286  T G	0.324705 # Method: deltaSVM, Tissue: GM12878 Lymphoblastoid Cells
chr1 13118 rs200579949 A G	1.696014 # Method: deltaSVM, Tissue: Average
chr1 13118 rs200579949 A G	1.195338 # Method: deltaSVM, Tissue: GM12878 Lymphoblastoid Cells
chr1 13273 rs531730856 G C	0.909246 # Method: deltaSVM, Tissue: GM12878 Lymphoblastoid Cells
...

Example 2

Search SNPs from file and print the output in BED format.

Command line:

#> python snp_rest_client.py -b -i infile

Input file: infile

rs62635286
chr1:11000-11100

Output:

chr1 13116 rs62635286  T G 0.324705 # Method: deltaSVM, Tissue: GM12878 Lymphoblastoid Cells
chr1 13116 rs62635286  T G 2.54928  # Method: deltaSVM, Tissue: HepG2 Hepatocellular Carcinoma
chr1 13116 rs62635286  T G 2.024774 # Method: deltaSVM, Tissue: K562 Leukemia Cells
chr1 11012 rs544419019 C G 3.137345 # Method: deltaSVM, Tissue: GM12878 Lymphoblastoid Cells
chr1 11012 rs544419019 C G 0.599368 # Method: deltaSVM, Tissue: HepG2 Hepatocellular Carcinoma
chr1 11012 rs544419019 C G 0.335556 # Method: deltaSVM, Tissue: K562 Leukemia Cells

Example 3

Search SNPs from file with data calculated for the tissue: K562 Leukemia Cells and print the output in BED format.

Command line:

#> python snp_rest_client.py -b -i infile -t 40

Input file: infile

rs62635286
chr1:11000-11100

Output:

chr1 13116 rs62635286  T G 2.024774 # Method: deltaSVM, Tissue: K562 Leukemia Cells
chr1 11012 rs544419019 C G 0.335556 # Method: deltaSVM, Tissue: K562 Leukemia Cells

How to include data into the SNPDelScore database?

SNPDelScore is based on a set of SNPs IDs. Download this VCF file for a complete list of available SNPs (VCF format definition). Currently, SNPDelScore includes 12 591 046 SNPs.

#CHROM	POS	ID	        REF	ALT
chr1	11008	rs575272151	C	G
chr1	11012	rs544419019	C	G
chr1	13110	rs540538026	G	A
chr1	13116	rs62635286	T	G
chr1	13118	rs200579949	A	G
chr1	13273	rs531730856	G	C
chr1	14464	rs546169444	A	T
        

SNPDelScore is able to import data in the same format including any predicted value as an extra column. Please, note that it should be submitted one file per cell line using the method name and the tissue code in the file name, e.g. CAPE_dsQTL_E003.vcf.gz for method CAPE-dsQTL and the cell line H1 Cells (E003).

Check our raw data files here

Contact Dr. Ivan Ovcharenko for data submission.

#CHROM	POS	ID	        REF	ALT	VALUE
chr1	11008	rs575272151	C	G	0.9942
chr1	13110	rs540538026	G	A	0.0386
chr1	14930	rs75454623	A	G	0.0685
chr1	15211	rs78601809	T	G	0.2551
chr1	16949	rs199745162	A	C	0.0261
chr1	30923	rs806731	G	T	0.0857
        

Contacts

Any question, comment or request should be addressed to Dr. Ivan Ovcharenko

Collaborators