Format

Send to

Choose Destination
Hum Mutat. 2017 Sep;38(9):1251-1258. doi: 10.1002/humu.23185. Epub 2017 Mar 15.

Predicting enhancer activity and variant impact using gkm-SVM.

Author information

1
McKusick-Nathans Institute of Genetic Medicine and Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland.

Abstract

We participated in the Critical Assessment of Genome Interpretation eQTL challenge to further test computational models of regulatory variant impact and their association with human disease. Our prediction model is based on a discriminative gapped-kmer SVM (gkm-SVM) trained on genome-wide chromatin accessibility data in the cell type of interest. The comparisons with massively parallel reporter assays (MPRA) in lymphoblasts show that gkm-SVM is among the most accurate prediction models even though all other models used the MPRA data for model training, and gkm-SVM did not. In addition, we compare gkm-SVM with other MPRA datasets and show that gkm-SVM is a reliable predictor of expression and that deltaSVM is a reliable predictor of variant impact in K562 cells and mouse retina. We further show that DHS (DNase-I hypersensitive sites) and ATAC-seq (assay for transposase-accessible chromatin using sequencing) data are equally predictive substrates for training gkm-SVM, and that DHS regions flanked by H3K27Ac and H3K4me1 marks are more predictive than DHS regions alone.

KEYWORDS:

MPRA; eQTL analysis; enhancers; gene regulation; machine learning; regulatory variation

PMID:
28120510
PMCID:
PMC5526747
DOI:
10.1002/humu.23185
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Wiley Icon for PubMed Central
Loading ...
Support Center