Format

Send to

Choose Destination
Hum Mutat. 2019 Sep;40(9):1299-1313. doi: 10.1002/humu.23820. Epub 2019 Jun 18.

Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types.

Author information

1
Department of Electrical Engineering and Computer Sciences, Center for Computational Biology, University of California, Berkeley, California.
2
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, California.
3
Ragon Institute of MGH MIT and Harvard, Cambridge, Massachusetts.
4
Chan Zuckerberg Biohub, San Francisco, California.

Abstract

Deciphering the potential of noncoding loci to influence gene regulation has been the subject of intense research, with important implications in understanding genetic underpinnings of human diseases. Massively parallel reporter assays (MPRAs) can measure regulatory activity of thousands of DNA sequences and their variants in a single experiment. With increasing number of publically available MPRA data sets, one can now develop data-driven models which, given a DNA sequence, predict its regulatory activity. Here, we performed a comprehensive meta-analysis of several MPRA data sets in a variety of cellular contexts. We first applied an ensemble of methods to predict MPRA output in each context and observed that the most predictive features are consistent across data sets. We then demonstrate that predictive models trained in one cellular context can be used to predict MPRA output in another, with loss of accuracy attributed to cell-type-specific features. Finally, we show that our approach achieves top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" Challenge for predicting effects of single-nucleotide variants. Overall, our analysis provides insights into how MPRA data can be leveraged to highlight functional regulatory regions throughout the genome and can guide effective design of future experiments by better prioritizing regions of interest.

KEYWORDS:

SNVs; functional genomics; gene regulation; machine learning; massively parallel reporter assays; regulatory variation

Supplemental Content

Full text links

Icon for Wiley Icon for PubMed Central
Loading ...
Support Center