Format

Send to

Choose Destination
PLoS Genet. 2019 Sep 25;15(9):e1008382. doi: 10.1371/journal.pgen.1008382. eCollection 2019 Sep.

Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development.

Author information

1
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America.
2
Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey, United States of America.
3
Center for Computational Biology, Flatiron Institute, New York, New York, United States of America.
4
Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany.
5
Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America.

Abstract

Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.

PMID:
31553718
DOI:
10.1371/journal.pgen.1008382
Free PMC Article

Conflict of interest statement

The authors have declared that no competing interests exist.

Supplemental Content

Full text links

Icon for Public Library of Science Icon for PubMed Central
Loading ...
Support Center