Format

Send to

Choose Destination
Hum Mutat. 2019 Sep;40(9):1280-1291. doi: 10.1002/humu.23797. Epub 2019 Jun 23.

Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay.

Author information

1
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland.
2
The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel.
3
Department of Plant and Microbial Biology, University of California, Berkeley, California.
4
Department of Computational Medicine and Bioinformatics and Department of Human Genetics, University of Michigan, Ann Arbor, Michigan.
5
MRC Biostatistics Unit, University of Cambridge, UK.
6
Department of Bioengineering and Therapeutic Sciences and Institute for Human Genetics, University of California San Francisco, San Francisco, California.
7
Department of Genome Sciences, University of Washington, Seattle, Washington.
8
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland.
9
Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.
10
School of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia.
11
Berlin Institute of Health (BIH), Berlin, Germany.
12
Charité - Universitätsmedizin Berlin, Berlin, Germany.
13
Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California, Berkeley, California.
14
Institute of Mathematical Problems of Biology, Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Russia.
15
Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.
16
Alan Turing Institute, British Library, London, UK.

Abstract

The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.

KEYWORDS:

MPRA; enhancers; gene regulation; machine learning; promoters; regulatory variation

PMID:
31106481
PMCID:
PMC6879779
[Available on 2020-09-01]
DOI:
10.1002/humu.23797

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center